CN109145970A - Question and answer treating method and apparatus, electronic equipment and storage medium based on image - Google Patents
Question and answer treating method and apparatus, electronic equipment and storage medium based on image Download PDFInfo
- Publication number
- CN109145970A CN109145970A CN201810884844.5A CN201810884844A CN109145970A CN 109145970 A CN109145970 A CN 109145970A CN 201810884844 A CN201810884844 A CN 201810884844A CN 109145970 A CN109145970 A CN 109145970A
- Authority
- CN
- China
- Prior art keywords
- convolution
- feature
- group
- image
- dynamic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
This disclosure relates to a kind of question and answer treating method and apparatus, electronic equipment and storage medium based on image, which comprises the problem of obtaining the characteristics of image of target image, and obtaining target problem feature;Dynamic convolution kernel is determined based on described problem feature, and convolution operation is executed based on dynamic convolution verification described image feature and obtains fusion feature;The answer that the target problem is directed in the target image is determined based on the fusion feature.The precision of prediction of answer can be improved in the embodiment of the present disclosure.
Description
Technical field
The embodiment of the present disclosure is related to computer vision field, in particular to a kind of question and answer processing method and dress based on image
It sets, electronic equipment and storage medium.
Background technique
Image recognition technology can be applied in every field, such as can come in conjunction with computer vision and language analysis real
Existing image labeling, language search figure and vision question and answer etc..Wherein vision question and answer refer in a given picture and associated
In the case where problem, what vision question answering system can be true provides answer text.
Processing vision question-answering task existing research in, it is necessary first to bilinearity pond method obtain this characteristics of image with
The fusion feature of problem characteristic finally predicts final answer based on the fusion feature.But current Fusion Features process exists
The relatively rear stage of vision Question-Answering Model executes, and cannot effectively utilize picture feature and problem characteristic.
Summary of the invention
The embodiment of the present disclosure proposes one kind and can effectively utilize characteristics of image and problem characteristic and improve accuracy of identification
Question and answer treating method and apparatus, electronic equipment and storage medium based on image.
According to the first aspect of the embodiments of the present disclosure, a kind of question and answer processing method based on image is provided comprising:
The problem of obtaining the characteristics of image of target image, and obtaining target problem feature, wherein the target problem is
Aiming at the problem that target image;
Dynamic convolution kernel is determined based on described problem feature, and is executed based on dynamic convolution verification described image feature
Convolution operation obtains fusion feature;
The answer that the target problem is directed in the target image is determined based on the fusion feature.
In the embodiments of the present disclosure, described that convolution operation acquisition is executed based on dynamic convolution verification described image feature
Fusion feature includes:
The dynamic convolution kernel is separately input into N number of convolution mould group, the composition wherein each convolution mould group is successively linked in sequence
Linear connection structure, wherein N is the integer greater than 1;
The dynamic convolution kernel, which is based on, by first convolution mould group in the linear connection structure executes the figure
As the convolution operation of feature, the first output result is obtained;
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th
The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1
And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group, wherein by each described
Convolution mould group executes convolution operation
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata
Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould
The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed at least
One convolution unit group, this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel and is assigned
Default convolution kernel;
The feature subdata of dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution to input
Execute convolution operation.
In the embodiments of the present disclosure, the dynamic convolution kernel by each convolution unit group based on distribution or default convolution
The feature subdata of verification input executes convolution operation
The is selected from the fisrt feature data that obtain of convolution unit group for executing convolution operation using dynamic convolution kernel
A part of data, the fisrt feature data are the number for executing the convolution unit of convolution operation using the dynamic convolution kernel and generating
According to;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained
Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group, wherein by each described
Convolution mould group executes convolution operation
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata
Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould
The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to
Each convolution unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
In the embodiments of the present disclosure, the feature of the dynamic convolution verification input by each convolution unit group based on distribution
Subdata executes convolution operation
Obtain the characteristic of each convolution unit group output;
Characteristic based on the output of each convolution unit group obtains the output result of corresponding convolution mould group.
In the embodiments of the present disclosure, described determined in the target image based on the fusion feature is asked for the target
The answer of topic includes:
Analysis identification is carried out to the fusion feature using preset disaggregated model, obtains answering for the target problem
Case.
In the embodiments of the present disclosure, described determined in the target image based on the fusion feature is asked for the target
The answer of topic includes:
Obtain attention weight matrix corresponding with the fusion feature;
Character pair weighted sum based on the attention weight matrix to the fusion feature;
Analysis identification is carried out to the fusion feature after weighted sum using preset disaggregated model, obtains and is directed to the target
The answer of problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
1 in the embodiments of the present disclosure, described to determine that dynamic convolution kernel includes: based on described problem feature
Described problem feature is handled using full Connection Neural Network to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the characteristics of image for obtaining target image, and obtain and be directed to the target image
Target problem the problem of feature include:
It is asked using the characteristics of image of target image described in the first model extraction, and using target described in the second model extraction
The problems in topic feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following
Ring neural network model.
In the embodiments of the present disclosure, the method also includes:
Target image is obtained from the image data of storage, and/or receives the target image of transmission;
The target problem and/or the target problem in recognition target image for receiving input.
According to the second aspect of an embodiment of the present disclosure, a kind of question and answer processing unit based on image is provided comprising:
The problem of feature obtains module, is configured to obtain the characteristics of image of target image, and obtains target problem is special
Sign, wherein the target problem is aiming at the problem that target image;
Fusion Features module is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic volume
Product verification described image feature executes convolution operation and obtains fusion feature;
Determining module is configured to the fusion feature and determines in the target image for the target problem
Answer.
In the embodiments of the present disclosure, the Fusion Features module includes N number of convolution mould group, and each convolution mould group successively sequentially connects
Composition linear connection structure is connect, wherein N is the integer greater than 1;And
The Fusion Features module is additionally configured to the dynamic convolution kernel being separately input into N number of convolution mould group, and passes through
First convolution mould group in the linear connection structure executes the convolution of described image feature based on the dynamic convolution kernel
Operation, obtains the first output result;And
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th
The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1
And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
Also, the Fusion Features module be further configured to for the input data for being input to each convolution mould group to be divided into
Few one group of feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and by dynamic convolution
Core distributes at least one convolution unit group, with by each convolution unit group based on distribution dynamic convolution kernel or default convolution
The feature subdata of verification input executes convolution operation;
Wherein, each convolution unit group being assigned other than at least one convolution unit group of dynamic convolution kernel is assigned default
Convolution kernel;
Also, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data includes
The output result of described image feature or the previous convolution mould group including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to execute convolution operation from using dynamic convolution kernel
The fisrt feature data that obtain of convolution unit group in select first part's data, the fisrt feature data be utilize described in
Dynamic convolution kernel executes the data that the convolution unit of convolution operation generates;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained
Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
The Fusion Features module is further configured to the input data for being input to each convolution mould group being divided at least one
Group feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is divided
Each convolution unit group is not distributed to, with the feature subdata of the dynamic convolution verification input by each convolution unit group based on distribution
Execute convolution operation;
Wherein, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data
The output result of previous convolution mould group including described image feature or including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to the characteristic exported based on each convolution unit group
According to the output result for obtaining corresponding convolution mould group.
In the embodiments of the present disclosure, the determining module is additionally configured to using preset disaggregated model to the fusion feature
Analysis identification is carried out, the answer for being directed to the target problem is obtained.
In the embodiments of the present disclosure, the determining module is additionally configured to obtain attention power corresponding with the fusion feature
Weight matrix, and the character pair weighted sum based on the attention weight matrix to the fusion feature, and utilize default
Disaggregated model analysis identification carried out to the fusion feature after weighted sum, obtain the answer for being directed to the target problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to using full Connection Neural Network to described problem
Feature is handled to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the feature obtains module and is additionally configured to utilize target image described in the first model extraction
Characteristics of image, and utilize the problems in target problem described in the second model extraction feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following
Ring neural network model.
In the embodiments of the present disclosure, described device further include:
Input module is configured to obtain target image from the image data of storage, and/or receives the target figure of transmission
Picture;And it is configured to receive the target problem of input and/or the target problem in recognition target image.
According to the third aspect of an embodiment of the present disclosure, a kind of electronic equipment is provided comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: execute method described in any one of above-described embodiment.
According to a fourth aspect of embodiments of the present disclosure, a kind of computer readable storage medium is provided, meter is stored thereon with
Calculation machine program instruction, the computer program instructions realize side described in any one of above-described embodiment when being executed by processor
Method.
In the embodiments of the present disclosure, after obtaining characteristics of image and problem characteristic, then dynamic is determined according to problem characteristic
Convolution kernel executes the convolution operation of characteristics of image, and due to having merged problem characteristic in dynamic convolution kernel, what is generated melts
The close blending image of energy in feature (visual signature) and problem information are closed, the accuracy of identification for answer is improved.
It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than
Limit the disclosure.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs
The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.
Fig. 1 shows a kind of flow chart of question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 2 shows the block diagrams according to the convolution mould group of the embodiment of the present disclosure;
Fig. 3 shows the flow chart of step S300 in the question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 4, which shows successively to be linked in sequence according to each convolution mould group of the embodiment of the present disclosure, constitutes the frame of linear connection structure
Figure;
Fig. 5 shows the flow chart that convolution operation is executed by each convolution mould group according to the embodiment of the present disclosure;
Fig. 6 shows determining in the target image for the mesh based on the fusion feature according to the embodiment of the present disclosure
The flow chart of the answer of mark problem;
Fig. 7 shows the process schematic of the question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 8 shows a kind of block diagram of question and answer processing unit based on image according to the embodiment of the present disclosure;
Fig. 9 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure;
Figure 10 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes
System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein
Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A,
B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
It is appreciated that above-mentioned each embodiment of the method that the disclosure refers to, without prejudice to principle logic,
To engage one another while the embodiment to be formed after combining, as space is limited, the disclosure is repeated no more.
In addition, the disclosure additionally provides question and answer processing unit based on image, electronic equipment, computer-readable storage medium
Matter, program, the above-mentioned any question and answer processing method based on image that can be used to realize that the disclosure provides, corresponding technical solution
With description and referring to the corresponding record of method part, repeat no more.
The embodiment of the present disclosure provides a kind of question and answer processing method based on image, and this method can be applied to be regarded in computer
Feel field, and it is used for the problem of handling the question and answer based on image.The embodiment of the present disclosure can be determined according to feature the problem of acquisition
The dynamic convolution kernel for executing convolution operation then can be with due to including characteristic information related with problem in the dynamic convolution kernel
It is effective to merge problem characteristic and characteristics of image when executing convolution operation, so that the visual signature obtained then can be improved and answer
The accuracy of identification of case.
Fig. 1 shows a kind of flow chart of question and answer processing method based on image according to the embodiment of the present disclosure, wherein such as Fig. 1
Shown, being somebody's turn to do the question and answer processing method based on image may include:
S100: the problem of obtaining the characteristics of image of target image, and obtaining target problem feature, wherein the target is asked
It is entitled aiming at the problem that target image;
S200: dynamic convolution kernel is determined based on described problem feature, and special based on dynamic convolution verification described image
Sign executes convolution operation and obtains fusion feature;
S300: the answer that the target problem is directed in the target image is determined based on the fusion feature.
The question and answer processing method based on image in the embodiment of the present disclosure can obtain figure according to the image and problem of acquisition
Answer as in about the problem.The problem can be about the quantity of certain an object, color, state or other letters in image
The problem of breath, such as in the image for including bicycle, the problem can be include several bicycles or bicycle why face
Color etc..The answer in image about problem can be identified by the method that the embodiment of the present disclosure provides.Above-mentioned example is only this
The citing of the application of open embodiment, not as the limitation of the embodiment of the present disclosure.
The embodiment of the present disclosure is described in detail below.Target image available first and be directed to the target image
Target problem.Wherein target image includes at least a picture, and target problem includes at least a problem, i.e. the disclosure is implemented
The quantity of image and problem can be not limited in example, wherein can be an image corresponds to multiple problems or multiple figures
As a corresponding problem, or it is also possible to other forms.For above situation, the embodiment of the present disclosure is for problem characteristic and figure
As the processing mode of feature is essentially identical.
Wherein, the mode for target image being obtained in the embodiment of the present disclosure may include: to obtain from the image data of storage
Target image, and/or receive the target image of transmission.That is, the target image in the embodiment of the present disclosure can be based on storage
The image selected in image data, wherein the selection information of user's input can be received by input module, and according to selection
Information determines the target image of selection, and input module may include in touch screen, Trackpad, keyboard, audio input device etc.
At least one, input module also may include other devices to execute the input and reception of selection information in other embodiments.
In addition, in the embodiment of the present disclosure target image can also be obtained by communicating with other electronic equipments, such as pass through wire communication
Module and/or wireless communication module receive the target image of transmission, wherein wireless communication module can by bluetooth, wifi,
The modes such as NFC realize the transmission of data, and wireless communication module can be by optical fiber, coaxial cable or other wireline cables come real
The transmission of existing data.
In addition, the mode for obtaining target problem may include: the target problem for receiving input in the embodiment of the present disclosure, and/
Or the target problem in recognition target image.Likewise, the target problem in the embodiment of the present disclosure can be based on input module
The information of input is also possible to based on wire communication module and/or the received information of wireless communication module.In addition, target is asked
Topic also may include in the target image or target problem may be graphic form, can pass through Text region at this time
Mode identifies the target problem in picture.
After obtaining target image and target problem, then the characteristics of image in target image can be obtained by step S100
And the problems in target problem feature.Wherein characteristics of image and problem characteristic can indicate respectively according to the form of matrix, scheme
As feature may include the parameters such as the gray value of each pixel of target image, problem characteristic may include the language in target problem
Information etc..
Wherein, in the embodiment of the present disclosure, the characteristics of image for obtaining target image, and obtain and be directed to the target figure
The problem of target problem of picture, feature may include: to utilize the characteristics of image of target image described in the first model extraction, Yi Jili
The problems in the target problem described in the second model extraction feature.Wherein the first model may include depth convolutional network model,
Second model may include Recognition with Recurrent Neural Network model, it can utilize the spy in depth convolutional network model extraction target image
Reference breath, obtains characteristics of image, and can use the language message in Recognition with Recurrent Neural Network model extraction target problem, obtains
Problem characteristic.The first model and the second model in the embodiment of the present disclosure are also possible to other kinds of machine learning network mould
Type.
In addition, the figure in the first preset algorithm recognition target image can also be passed through in the other embodiments of the disclosure
As feature, such as first preset algorithm may include the algorithm based on wavelet moment and algorithm based on fractal characteristic etc., as long as
The method that can recognize that the characteristics of image in target image can be applied to the embodiment of the present disclosure.It simultaneously can also be by the
Two preset algorithms obtain the problems in target problem feature, wherein the second preset algorithm may include Text region algorithm, text
Segmentation algorithm etc..
After obtaining characteristics of image and problem characteristic, step S200 can be executed, it can determine based on problem characteristic
Execute the dynamic convolution kernel of convolution operation.Convolution kernel is when given input picture carries out image procossing, in the output image often
One pixel is the weighted average of pixel in a zonule in input picture, and wherein weight is defined by a function, this letter
Number is known as convolution kernel.The embodiment of the present disclosure can determine the relevant parameter of convolution kernel according to problem characteristic, eventually for formed with
The relevant dynamic convolution kernel of problem characteristic, thus when executing the convolution operation of characteristics of image, it can be by problem characteristic and image
Feature is effectively merged, so that obtained fusion feature retains the spatial information of target image, while closely having been merged and having been asked
Inscribe information.
In the embodiments of the present disclosure, can use full Connection Neural Network described problem feature is handled to obtain it is described
Dynamic convolution kernel.Problem characteristic can be handled by way of connecting entirely, obtain dynamic convolution kernel.Pass through the second mould
The problem of type obtains feature can be one-dimensional matrix form, and the dynamic convolution kernel obtained by way of connecting entirely can be multidimensional
Matrix form, such as two dimension or three-dimensional.After determining dynamic convolution kernel, then it is special image can be executed according to the dynamic convolution sum
The convolution operation of sign, to obtain the fusion feature for having merged problem information and image information.By being carried out to the fusion feature
Classification learning can then obtain the answer information that target problem is directed in target image.
Wherein it is possible to by fusion feature be input to it is trained after classifier or categorization module in, and using classification mould
The corresponding disaggregated model of block carries out classification learning to fusion feature, finally obtains the answer of target problem.Wherein disaggregated model can
To include deep learning neural network model, wherein the model can be the classification with high accuracy of identification completed by training
Model.
Based on above-mentioned, it can complete for the identification of the answer of target problem in target image, wherein executing convolution
When operation, the convolution kernel for executing convolution operation can be obtained according to problem characteristic, so that problem characteristic and image are special
Sign can be effectively fused together, rather than the fusion of feature is executed by bilinear mode, and accuracy of identification can be improved,
Improve the accuracy of prediction.
In order to clearly demonstrate the embodiment of the present disclosure, the embodiment of the present disclosure is described in detail separately below.
As described above, after determining dynamic convolution kernel, it can execute the volume of characteristics of image by the dynamic convolution kernel
Product operates, and dynamic convolution kernel and characteristics of image can be input in convolution mould group in the embodiment of the present disclosure, pass through convolution mould group
Above-mentioned convolution operation can then be executed.Wherein, the embodiment of the present disclosure may include at least one convolution mould group, include a volume
When product module group, then fusion feature is obtained by the convolution operation that the convolution mould group executes characteristics of image based on dynamic convolution kernel.And
When including multiple convolution mould groups, multiple convolution mould group can be sequentially sequentially connected, and constitute linear connection structure, thus
Successively convolution operation can be executed according to dynamic convolution kernel according to each convolution operation on the linear connection structure to be merged
Feature.
Fig. 2 shows the block diagrams according to the convolution mould group of the embodiment of the present disclosure, wherein the convolution mould group in the embodiment of the present disclosure
It may include at least one convolution unit group, correspondingly convolution operation can be executed by least one convolution unit group.Fig. 2
It shows including 8 convolution unit groups, may include any number of convolution unit group, this public affairs in the other embodiments of the disclosure
It opens and this is not limited.In addition, may include at least one convolution unit (A1, A2, A3 ...) in each convolution unit group, figure
2 show three convolution units, but not as the limitation of the embodiment of the present disclosure, and the quantity of the convolution unit in other embodiments can be with
It sets according to demand.Each convolution unit in each convolution unit group can successively be linked in sequence, and be input to convolution mould group
Input data can be allocated to multiple groups feature subdata, first convolution unit A1 of each convolution unit group can be docked
The feature subdatas of receipts executes convolution operation, and convolution results can be exported to second convolution unit A2, subsequent each
Convolution unit then can further execute convolution operation according to the convolution results of previous convolution unit, eventually by the last one volume
Product unit obtains fisrt feature data.Here the input data for being input to convolution mould group may include that characteristics of image is either previous
The output result of convolution mould group.The fisrt feature data of acquisition are determined for final fusion feature, are also possible to export
To the input data of next convolution mould group.Wherein, when the embodiment of the present disclosure only includes a convolution mould group, the convolution mould group
Input data is characteristics of image, and the fisrt feature data of output are fusion feature.When including multiple convolution mould groups, convolution
The input data of mould group can be for output that characteristics of image or previous convolution mould group export as a result, and each convolution mould group output
Fisrt feature data are determined for fusion feature.
In the embodiment of the present disclosure, can at least it be held in the presence of a convolution unit group using dynamic convolution kernel in each convolution mould group
Row convolution operation, such as convolution operation can be executed using dynamic convolution kernel with only one convolution unit group, or can also be complete
Portion's convolution unit group all utilizes the dynamic convolution kernel to execute convolution operation, or may be that a part of convolution unit group utilization is dynamic
State convolution kernel executes convolution operation.Such as in Fig. 2, first convolution unit group is then to execute convolution using dynamic convolution kernel to grasp
Make, the convolution unit group other than the convolution unit group for executing convolution operation using dynamic convolution kernel can use default convolution
Core executes convolution operation, which can be corresponding convolution unit group and calculate determining convolution kernel through backpropagation,
It can be such as static volume product core, or be also possible to preconfigured convolution kernel.It may be implemented in convolution mould group based on the configuration
At least part convolution unit group can use dynamic convolution kernel and execute convolution operation, so as to problem of implementation feature and image
The fusion of feature, while can also solve that operand brought by problem characteristic is big to ask as introducing in dynamic convolution kernel
Topic, improves arithmetic speed.It further, can be down in the convolution unit group for executing convolution operation using dynamic convolution kernel
A rare convolution unit executes convolution operation using the dynamic convolution kernel, can be further reduced operand in this way and improve operation
Precision.
In addition, as described above, also may include multiple convolution mould groups in the embodiment of the present disclosure, multiple convolution mould group can be with
It is successively linked in sequence and forms linear structure, and each convolution mould group can be passed sequentially through and execute convolution behaviour using dynamic convolution kernel
Make.Wherein the input data of first volume product module group can be characteristics of image, can use dynamic convolution kernel and execute characteristics of image
Convolution operation, obtain the first output as a result, then second convolution mould group can then be executed according to dynamic convolution kernel it is first defeated
The convolution operation of result out, and so on, fusion feature is exported by the last one convolution mould group.Fig. 3 is shown according to the disclosure
The flow chart of the question and answer process method step S300 based on image of embodiment, wherein step S300 may include:
S301: being separately input into N number of convolution mould group for the dynamic convolution kernel, wherein each convolution mould group is successively linked in sequence
Linear connection structure is constituted, wherein N is the integer greater than 1;
S302: the dynamic convolution kernel is based on by first convolution mould group in the linear connection structure and executes institute
The convolution operation for stating characteristics of image obtains the first output result;
S303: the dynamic convolution kernel is based on by i-th of convolution mould group in the linear connection structure and executes the
The convolution operation of the output result of i-1 convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is
Integer greater than 1 and less than or equal to N.
Fig. 4, which shows successively to be linked in sequence according to each convolution mould group of the embodiment of the present disclosure, constitutes the frame of linear connection structure
Figure.Wherein the linear connection structure may include the convolution mould group (QGHC mould group) of multiple sequential connections, such as may include 3
Convolution mould group B, C and D also may include the convolution mould group of other quantity in the other embodiments of the disclosure, such as may be
1 convolution mould group or more than one convolution mould group.
Wherein it is possible to the dynamic convolution kernel that step S200 is obtained is input in each convolution mould group B, C, D respectively, to pass through
Convolution mould group executes corresponding convolution operation according to the dynamic convolution kernel.Wherein convolution mould group B can receive obtains in step S100
The characteristics of image taken, and utilize the convolution operation of the dynamic convolution kernel of input execution characteristics of image, wherein it can in convolution mould group B
To utilize dynamic convolution kernel to execute the convolution operation of characteristics of image, remaining convolution unit group using at least one set of convolution unit group
Convolution operation is executed using default convolution kernel, it is unrelated with problem characteristic to preset convolution kernel.Then convolution mould group C can use dynamic
The convolution operation that convolution kernel executes the first output result obtain the second output as a result, can use in same convolution mould group C to
Lack one group of convolution unit group to utilize dynamic convolution kernel to execute the convolution operation of the first output result, remaining convolution unit group utilizes
Default convolution kernel executes convolution operation, and it is unrelated with problem characteristic to preset convolution kernel.Last convolution mould group D can use dynamic convolution
The convolution operation that core executes the second output result obtains fusion feature, can use at least one set of volume in same convolution mould group D
Product unit group utilizes dynamic convolution kernel to execute the convolution operation of the second output result, remaining convolution unit group utilizes default convolution
Core executes convolution operation, and it is unrelated with problem characteristic to preset convolution kernel.
It, can sufficient blending image since the embodiment of the present disclosure can be operated by the multiple convolution of multiple convolution mould groups
And problem characteristic, improve the accuracy of default problem answers.Simultaneously each convolution mould group can part convolution unit using dynamic
State convolution kernel executes convolution operation, reduces operand and improves arithmetic speed.
As described above, each convolution mould group in the embodiment of the present disclosure includes at least one convolution unit group, Fig. 5 is shown
According to the flow chart for executing convolution operation in the embodiment of the present disclosure by each convolution mould group, wherein executing volume by each convolution mould group
Product operates
S3001: the input data for being input to convolution mould group is divided at least one set of feature subdata, the feature subnumber
According to group number it is corresponding with the group number of the convolution unit group, and the input data include described image feature or including should
The output result of the previous convolution mould group of convolution mould group;
S3002: each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed
At least one convolution unit group is given, this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel
It is assigned default convolution kernel;
S3003: the feature of dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution to input
Subdata executes convolution operation.
I.e. each convolution mould group can will be inputted when receiving input data according to the group number of the convolution unit group in it
Data are divided into the feature subdata of corresponding group number, can execute dimension-reduction treatment to input data by this way, so as into
One step reduces data volume and facilitates operation.
At least there is a convolution unit group in convolution mould group and executes convolution operation, and the convolution list using dynamic convolution kernel
Also at least there is a convolution unit in tuple and execute convolution operation, remaining convolution unit or convolution using the dynamic convolution kernel
Unit group executes convolution operation using default convolution kernel, finally obtains convolution results.
In addition, in order to reinforce the fusion between problem characteristic and characteristics of image convolution can be executed in each convolution unit group
When operation, the mixing of corresponding result is executed by feature mixed cell.Fig. 5 is shown according to step S3003 in the embodiment of the present disclosure
Flow chart, wherein can also include:
S30031: it is selected from the fisrt feature data that the convolution unit group for executing convolution operation using dynamic convolution kernel obtains
Ze Chu first part data, the fisrt feature data are that the convolution unit life of convolution operation is executed using the dynamic convolution kernel
At data;
S30032: the of the second feature data that the convolution unit group that convolution operation is executed using default convolution kernel is obtained
Two partial datas are substituted for first part's data, obtain third feature data;
S30033: the output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
As shown in Figure 2, between second convolution unit A2 in each convolution unit group and third convolution unit A3
It can also include feature mixed cell A4, wherein second convolution unit A2 in first convolution unit group exports result and be
Fisrt feature data, second convolution unit group and third unit group output result are second feature data, are needed exist for
It is bright, the result that the convolution unit group that convolution operation is executed using dynamic convolution kernel is obtained can be united in the embodiment of the present disclosure
Referred to as fisrt feature data, i.e. fisrt feature data are that the convolution unit generation of convolution operation is executed using the dynamic convolution kernel
Data.And the result that the convolution unit group for executing convolution operation using default convolution kernel obtains is referred to as second feature number
According to being only intended to area here wherein each convolution unit fisrt feature data obtained and second feature data can be different
Transfer state convolution kernel and default convolution kernel execute the result of convolution operation.Wherein, in order to improve the fusion of problem and characteristics of image
Degree, the fisrt feature data that this feature mixed cell can be obtained from the convolution unit A2 in the 1st convolution unit group
In select first part's data, and first part's data are replaced to by 2 convolution unit groups and third convolution unit
A part of the second feature data of group A3 output forms third feature data, so that remaining utilizes default convolution kernel
Executing in the convolution unit group of convolution operation also may include the relevant feature of problem information, i.e., can also be in third feature data
Including the relevant feature of problem information.Then fisrt feature data can be executed according to every group of third convolution unit A3 respectively
With the convolution operation of third feature data, fusion feature is obtained.
Wherein, in the embodiments of the present disclosure, the number of the convolution unit in the convolution unit group in each convolution mould group can be with
Identical to can also be different, when executing above-mentioned steps S30032-S30033, the convolution unit that can use same level is executed
Step is stated, a convolution unit is included at least in the same level using dynamic convolution kernel and executes convolution operation, so as to incite somebody to action
Partial data including problem characteristic replaces the characteristic exported to remaining convolution unit.In the other embodiments of the disclosure
In, it is also possible to execute above-mentioned steps using the convolution unit of different levels, likewise, in the convolution unit of the different levels extremely
It less include a convolution unit using dynamic convolution kernel execution convolution operation, so as to include the partial data of problem characteristic
Replace the characteristic exported to remaining convolution unit.And after replacing partial data, it can use subsequent convolution unit
Convolution operation is continued to execute, the output result of corresponding convolution mould group is finally obtained.
Through the above configuration, the data for having merged problem characteristic can be substituted into remaining and execute volume using static convolution kernel
In the convolution unit group of product operation, in the case where not needing to utilize the convolution operation of dynamic convolution kernel simultaneously, very big efficiency
Arithmetic speed is improved, while effectively realizing Fusion Features, improves precision of prediction.
In addition, in the embodiments of the present disclosure, each convolution unit group in convolution mould group can be assigned dynamic convolution
Core, i.e., each convolution unit group in convolution mould group may be by dynamic convolution kernel and execute the corresponding feature subdata inputted
Convolution operation.Also, at least one convolution unit executes convolution operation using the dynamic convolution kernel in each convolution unit group.This
When, executing convolution operation by each convolution mould group may include:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata
Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould
The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to
Each convolution unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
As described above, each convolution unit group in each convolution mould group of the embodiment of the present disclosure can be assigned dynamic convolution
Core, i.e., the characteristic that each convolution unit group obtains after the convolution operation using dynamic convolution kernel execution character pair subdata
In can include problem information and image information, the obtained output knot of characteristic exported eventually by each convolution unit group
Fruit can adequately merge problem characteristic and characteristics of image.Also, the dynamic convolution kernel by each convolution unit group based on distribution
Executing convolution operation to the feature subdata of input may include obtaining the characteristic of each convolution unit group output, and be based on
The characteristic of each convolution unit group output obtains the output result of corresponding convolution mould group.Figure can have been merged in the output result
As information and problem information, fusion feature may finally be obtained by the output result that the last one convolution mould group obtains.
Wherein the output result of the corresponding convolution mould group of characteristic acquisition based on the output of each convolution unit group may include
The characteristic connection combination of each convolution unit group output is formed into output as a result, also may include: by one of convolution list
A part of data in the characteristic of tuple output replace the characteristic exported to remaining convolution unit group, thus using replacing
The combination of characteristic after changing forms output as a result, being melted eventually by the output result that the last one convolution mould group obtains
Close feature.By the configuration can significantly more efficient fusion problem information and image information, improve the precision of prediction of answer.
After obtaining fusion feature, then it is special to the fusion to can use the corresponding preset disaggregated model of categorization module
Sign carries out analysis identification, obtains the answer for being directed to the target problem.Wherein the disaggregated model may include deep learning network
Model can carry out classification based training to the fusion feature of input, final to obtain the answer information for being directed to target problem.This field
Technical staff can construct disaggregated model according to different settings or demand, it can realize classification using prior art means
The functional structure of model, to this embodiment of the present disclosure without being described in detail.
In addition, can also be right according to the weight determined for fusion feature before carrying out classification learning using fusion feature
Fusion feature carries out operation, to further increase precision of prediction.
Wherein, Fig. 6, which is shown, determines needle in the target image based on the fusion feature according in the embodiment of the present disclosure
To the flow chart of the answer of the target problem, i.e., the step S300 in the embodiment of the present disclosure may include:
S304: attention weight matrix corresponding with the fusion feature is obtained;
S305: the character pair weighted sum based on the attention weight matrix to the fusion feature;
S306: analysis identification is carried out to the fusion feature after weighted sum using preset disaggregated model, obtains and is directed to institute
State the answer of target problem.
In the embodiment of the present disclosure, the parameters in attention weight matrix indicate the weight of each parameter in fusion feature
Value, wherein can successively carry out the fusion feature that convolution mould group exports to be added behaviour with problem characteristic on different images region
Make, then multiplies 1 convolution kernel and softmax operation using 1, addition results are mapped as attention weight matrix.That is the disclosure
Attention weight matrix in embodiment can be obtained by fusion feature with the corresponding sum operation of problem characteristic.Wherein this is asked
Topic feature can be based on obtaining after target problem linearization process, or pass through what Recognition with Recurrent Neural Network obtained, this
Disclosure is to this without limiting.
Can be executed after attention weight matrix has been determined by the weight of attention weight matrix and fusion feature into
Row weighted sum, so as to the fusion feature optimized.Then it can use disaggregated model to the fusion feature of the optimization
Analysis identification is carried out, the answer for being directed to the target problem is obtained.Wherein, due to indicating to merge using attention weight matrix
The weight of each parameter of feature carries out identification classification after optimizing according to the attention weight to fusion feature, can further mention
High precision of prediction.
For the embodiment embodiment of the present disclosure being more complete and clear, illustrate below with reference to Fig. 7 to the embodiment of the present disclosure
Illustrate, Fig. 7 shows the process schematic of the question and answer processing method based on image according to the embodiment of the present disclosure.
Wherein, target image and target problem are obtained first, and wherein target problem is " including several bicycles ".It is obtaining
It can use the first model after target image and target problem and obtain the characteristics of image for corresponding to target image, and utilize second
Model obtains the problem of corresponding to target problem feature.Then dynamic convolution kernel is obtained based on problem characteristic, and respectively moves this
State convolution kernel is input to three convolution mould groups B, C, D.The convolution of characteristics of image is executed based on dynamic convolution kernel by convolution mould group B
Operation obtains the first output as a result, obtaining using the convolution operation that convolution mould group B executes the first output result based on dynamic convolution kernel
To the second output as a result, being merged using the convolution operation that convolution mould group C executes the second output result based on dynamic convolution kernel
Feature.Wherein each convolution mould group may be incorporated into default convolution kernel to reduce operand.It then can be with base after obtaining fusion feature
Problem feature after fusion feature and linear process gains attention power weight matrix, based on the attention weight matrix with merge
The fusion feature that the weighted sum of feature is optimized then utilizes the taxology of the fusion feature of this row of disaggregated model optimization
It practises, obtains the answer " 2 " for target problem.The above is only an exemplary illustration of the embodiment of the present disclosure, it is not intended as this
The limitation of open embodiment.
In conclusion after obtaining characteristics of image and problem characteristic, then being determined according to problem characteristic in the embodiment of the present disclosure
Dynamic convolution kernel generates to execute the convolution operation of characteristics of image due to having merged problem characteristic in dynamic convolution kernel
Fusion feature (visual signature) in can close blending image and problem information, improve the identification of the answer for target problem
Precision.In addition, the convolution mould group in the embodiment of the present disclosure can introduce the default convolution kernel unrelated with problem, can guarantee to scheme
Under the premise of the close fusion of picture and problem, operand can also be reduced, operation time is saved.
It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment
It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function
It can be determined with possible internal logic.
Fig. 8 shows a kind of block diagram of question and answer processing unit device based on image according to the embodiment of the present disclosure, such as Fig. 8 institute
Show, the question and answer processing unit based on image may include:
The problem of feature obtains module 10, is configured to obtain the characteristics of image of target image, and obtains target problem
Feature, wherein the target problem is aiming at the problem that target image;
Fusion Features module 20 is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic
Convolution kernel executes convolution operation to described image feature and obtains fusion feature;
Determining module 30 is configured to the fusion feature and determines in the target image for the target problem
Answer.
In the embodiments of the present disclosure, the Fusion Features module includes N number of convolution mould group, and each convolution mould group successively sequentially connects
Composition linear connection structure is connect, wherein N is the integer greater than 1;And
The Fusion Features module is additionally configured to the dynamic convolution kernel being separately input into N number of convolution mould group, and passes through
First convolution mould group in the linear connection structure executes the convolution of described image feature based on the dynamic convolution kernel
Operation, obtains the first output result;And
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th
The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1
And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
Also, the Fusion Features module be further configured to for the input data for being input to each convolution mould group to be divided into
Few one group of feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and by dynamic convolution
Core distributes at least one convolution unit group, with by each convolution unit group based on distribution dynamic convolution kernel or default convolution
The feature subdata of verification input executes convolution operation;
Wherein, each convolution unit group being assigned other than at least one convolution unit group of dynamic convolution kernel is assigned default
Convolution kernel;
Also, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data includes
The output result of described image feature or the previous convolution mould group including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to execute convolution operation from using dynamic convolution kernel
The fisrt feature data that obtain of convolution unit group in select first part's data, the fisrt feature data be utilize described in
Dynamic convolution kernel executes the data that the convolution unit of convolution operation generates;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained
Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
The Fusion Features module is further configured to the input data for being input to each convolution mould group being divided at least one
Group feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is divided
Each convolution unit group is not distributed to, with the feature subdata of the dynamic convolution verification input by each convolution unit group based on distribution
Execute convolution operation;
Wherein, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data
The output result of previous convolution mould group including described image feature or including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to the characteristic exported based on each convolution unit group
According to the output result for obtaining corresponding convolution mould group.
In the embodiments of the present disclosure, the determining module is additionally configured to using preset disaggregated model to the fusion feature
Analysis identification is carried out, the answer for being directed to the target problem is obtained.
In the embodiments of the present disclosure, the determining module is additionally configured to obtain attention power corresponding with the fusion feature
Weight matrix, and the character pair weighted sum based on the attention weight matrix to the fusion feature, and utilize default
Disaggregated model analysis identification carried out to the fusion feature after weighted sum, obtain the answer for being directed to the target problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to using full Connection Neural Network to described problem
Feature is handled to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the feature obtains module and is additionally configured to utilize target image described in the first model extraction
Characteristics of image, and utilize the problems in target problem described in the second model extraction feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following
Ring neural network model.
In the embodiments of the present disclosure, described device further include:
Input module is configured to obtain target image from the image data of storage, and/or receives the target figure of transmission
Picture;And it is configured to receive the target problem of input and/or the target problem in recognition target image.
In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding
The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this
In repeat no more.
The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute
It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter
Calculation machine readable storage medium storing program for executing.
The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor;For storage processor executable instruction
Memory;Wherein, the processor is configured to the above method.
Fig. 9 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.For example, electronic equipment 800 can
To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for
Body equipment, the terminals such as personal digital assistant.
Referring to Fig. 9, electronic equipment 800 may include following one or more components: processing component 802, memory 804,
Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814,
And communication component 816.
The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical
Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold
Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds
Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with
Facilitate the interaction between multimedia component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data
Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory
Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it
Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable
Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly
Flash memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe
Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user.
In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches
Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding
The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped
When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition
Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone
It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical
Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800
Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example
As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or
The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800
The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured
For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor,
Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also
To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment.
Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one
In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete
The above method.
Figure 10 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment
1900 may be provided as a server.Referring to Fig.1 0, it further comprises one that electronic equipment 1900, which includes processing component 1922,
A or multiple processors and memory resource represented by a memory 1932, can be by processing component 1922 for storing
The instruction of execution, such as application program.The application program stored in memory 1932 may include one or more every
One corresponds to the module of one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.
Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900
Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated
(I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as
Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with
Complete the above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment
Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage
Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium
More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits
It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable
Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon
It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above
Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to
It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire
Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs,
Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages
The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as
Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer
Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one
Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part
Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions
Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can
Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure
Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/
Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/
Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas
The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas
When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced
The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to
It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction
Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram
The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other
In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce
Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment
Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use
The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box
It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel
Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or
The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic
The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology
Other those of ordinary skill in domain can understand each embodiment disclosed herein.
Claims (10)
1. a kind of question and answer processing method based on image characterized by comprising
Obtain target image characteristics of image, and obtain target problem the problem of feature, wherein the target problem be for
The problem of target image;
Dynamic convolution kernel is determined based on described problem feature, and convolution is executed based on dynamic convolution verification described image feature
Operation obtains fusion feature;
The answer that the target problem is directed in the target image is determined based on the fusion feature.
2. the method according to claim 1, wherein described check described image feature based on the dynamic convolution
Executing convolution operation acquisition fusion feature includes:
The dynamic convolution kernel is separately input into N number of convolution mould group, wherein each convolution mould group, which is successively linked in sequence, constitutes straight line
Type connection structure, wherein N is the integer greater than 1;
The dynamic convolution kernel, which is based on, by first convolution mould group in the linear connection structure executes described image spy
The convolution operation of sign obtains the first output result;
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th convolution
The convolution operation of the output result of mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1 and small
In or equal to N integer.
3. according to the method described in claim 2, it is characterized in that, each convolution mould group includes at least one convolution unit
Group, wherein executing convolution operation by each convolution mould group includes:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group number of the feature subdata with
The group number of the convolution unit group is corresponding, and the input data includes described image feature or including the convolution mould group
The output result of previous convolution mould group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed at least one
Convolution unit group, it is assigned default that this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel
Convolution kernel;
Dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution execute the feature subdata of input
Convolution operation.
4. according to the method described in claim 3, it is characterized in that, described pass through each dynamic volume of the convolution unit group based on distribution
Product core or default convolution kernel execute convolution operation to the feature subdata of input and include:
First is selected from the fisrt feature data that the convolution unit group for executing convolution operation using dynamic convolution kernel obtains
Divided data, the fisrt feature data are the data for executing the convolution unit of convolution operation using the dynamic convolution kernel and generating;
The second part data for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained
First part's data are substituted for, third feature data are obtained;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
5. according to the method described in claim 2, it is characterized in that, each convolution mould group includes at least one convolution unit
Group, wherein executing convolution operation by each convolution mould group includes:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group number of the feature subdata with
The group number of the convolution unit group is corresponding, and the input data includes described image feature or including the convolution mould group
The output result of previous convolution mould group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to each volume
Product unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
6. according to the method described in claim 5, it is characterized in that, described pass through each dynamic volume of the convolution unit group based on distribution
The feature subdata of product verification input executes convolution operation and includes:
Obtain the characteristic of each convolution unit group output;
Characteristic based on the output of each convolution unit group obtains the output result of corresponding convolution mould group.
7. the method according to claim 1, wherein described determine the target image based on the fusion feature
In for the answer of the target problem include:
Analysis identification is carried out to the fusion feature using preset disaggregated model, obtains the answer for being directed to the target problem.
8. a kind of question and answer processing unit based on image characterized by comprising
The problem of feature obtains module, is configured to obtain the characteristics of image of target image, and obtains target problem feature,
Described in target problem be aiming at the problem that target image;
Fusion Features module is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic convolution kernel
Convolution operation is executed to described image feature and obtains fusion feature;
Determining module is configured to the fusion feature and determines answering for the target problem in the target image
Case.
9. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require any one of 1 to 7 described in method.
10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer
Method described in any one of claim 1 to 7 is realized when program instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810884844.5A CN109145970B (en) | 2018-08-06 | 2018-08-06 | Image-based question and answer processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810884844.5A CN109145970B (en) | 2018-08-06 | 2018-08-06 | Image-based question and answer processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109145970A true CN109145970A (en) | 2019-01-04 |
CN109145970B CN109145970B (en) | 2021-01-12 |
Family
ID=64791740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810884844.5A Active CN109145970B (en) | 2018-08-06 | 2018-08-06 | Image-based question and answer processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145970B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222770A (en) * | 2019-06-10 | 2019-09-10 | 成都澳海川科技有限公司 | A kind of vision answering method based on syntagmatic attention network |
CN111488887A (en) * | 2020-04-09 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Image processing method and device based on artificial intelligence |
CN112016577A (en) * | 2019-05-30 | 2020-12-01 | 华为技术有限公司 | Image processing method and device |
WO2021110147A1 (en) * | 2019-12-06 | 2021-06-10 | 阿里巴巴集团控股有限公司 | Methods and apparatuses for image processing, image training and channel shuffling |
CN113052756A (en) * | 2019-12-27 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Image processing method, intelligent terminal and storage medium |
CN114581746A (en) * | 2022-03-04 | 2022-06-03 | 北京百度网讯科技有限公司 | Object detection method, device, equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN105701507A (en) * | 2016-01-13 | 2016-06-22 | 吉林大学 | Image classification method based on dynamic random pooling convolution neural network |
CN106649542A (en) * | 2015-11-03 | 2017-05-10 | 百度(美国)有限责任公司 | Systems and methods for visual question answering |
CN107679582A (en) * | 2017-10-20 | 2018-02-09 | 深圳市唯特视科技有限公司 | A kind of method that visual question and answer are carried out based on multi-modal decomposition model |
CN108170816A (en) * | 2017-12-31 | 2018-06-15 | 厦门大学 | A kind of intelligent vision Question-Answering Model based on deep neural network |
-
2018
- 2018-08-06 CN CN201810884844.5A patent/CN109145970B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN106649542A (en) * | 2015-11-03 | 2017-05-10 | 百度(美国)有限责任公司 | Systems and methods for visual question answering |
EP3166049A1 (en) * | 2015-11-03 | 2017-05-10 | Baidu USA LLC | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering |
CN105701507A (en) * | 2016-01-13 | 2016-06-22 | 吉林大学 | Image classification method based on dynamic random pooling convolution neural network |
CN107679582A (en) * | 2017-10-20 | 2018-02-09 | 深圳市唯特视科技有限公司 | A kind of method that visual question and answer are carried out based on multi-modal decomposition model |
CN108170816A (en) * | 2017-12-31 | 2018-06-15 | 厦门大学 | A kind of intelligent vision Question-Answering Model based on deep neural network |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016577A (en) * | 2019-05-30 | 2020-12-01 | 华为技术有限公司 | Image processing method and device |
CN110222770A (en) * | 2019-06-10 | 2019-09-10 | 成都澳海川科技有限公司 | A kind of vision answering method based on syntagmatic attention network |
CN110222770B (en) * | 2019-06-10 | 2023-06-02 | 成都澳海川科技有限公司 | Visual question-answering method based on combined relationship attention network |
WO2021110147A1 (en) * | 2019-12-06 | 2021-06-10 | 阿里巴巴集团控股有限公司 | Methods and apparatuses for image processing, image training and channel shuffling |
CN113052756A (en) * | 2019-12-27 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Image processing method, intelligent terminal and storage medium |
CN111488887A (en) * | 2020-04-09 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Image processing method and device based on artificial intelligence |
CN111488887B (en) * | 2020-04-09 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Image processing method and device based on artificial intelligence |
CN114581746A (en) * | 2022-03-04 | 2022-06-03 | 北京百度网讯科技有限公司 | Object detection method, device, equipment and medium |
CN114581746B (en) * | 2022-03-04 | 2023-09-05 | 北京百度网讯科技有限公司 | Object detection method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN109145970B (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145970A (en) | Question and answer treating method and apparatus, electronic equipment and storage medium based on image | |
CN110210535A (en) | Neural network training method and device and image processing method and device | |
CN109800737A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN109800744A (en) | Image clustering method and device, electronic equipment and storage medium | |
CN110378976A (en) | Image processing method and device, electronic equipment and storage medium | |
CN108764069A (en) | Biopsy method and device | |
CN109816611A (en) | Video repairing method and device, electronic equipment and storage medium | |
CN109697734A (en) | Position and orientation estimation method and device, electronic equipment and storage medium | |
CN109614613A (en) | The descriptive statement localization method and device of image, electronic equipment and storage medium | |
CN110189249A (en) | A kind of image processing method and device, electronic equipment and storage medium | |
CN109658352A (en) | Optimization method and device, electronic equipment and the storage medium of image information | |
CN107944409A (en) | video analysis method and device | |
CN110060262A (en) | A kind of image partition method and device, electronic equipment and storage medium | |
CN109919300A (en) | Neural network training method and device and image processing method and device | |
CN110298310A (en) | Image processing method and device, electronic equipment and storage medium | |
CN109615006A (en) | Character recognition method and device, electronic equipment and storage medium | |
CN109829863A (en) | Image processing method and device, electronic equipment and storage medium | |
CN109902738A (en) | Network module and distribution method and device, electronic equipment and storage medium | |
CN109165738A (en) | Optimization method and device, electronic equipment and the storage medium of neural network model | |
CN109255784A (en) | Image processing method and device, electronic equipment and storage medium | |
CN109858614A (en) | Neural network training method and device, electronic equipment and storage medium | |
CN109978891A (en) | Image processing method and device, electronic equipment and storage medium | |
CN109859144A (en) | Image processing method and device, electronic equipment and storage medium | |
CN109711546A (en) | Neural network training method and device, electronic equipment and storage medium | |
CN110458218A (en) | Image classification method and device, sorter network training method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |