CN109145970A - Question and answer treating method and apparatus, electronic equipment and storage medium based on image - Google Patents

Question and answer treating method and apparatus, electronic equipment and storage medium based on image Download PDF

Info

Publication number
CN109145970A
CN109145970A CN201810884844.5A CN201810884844A CN109145970A CN 109145970 A CN109145970 A CN 109145970A CN 201810884844 A CN201810884844 A CN 201810884844A CN 109145970 A CN109145970 A CN 109145970A
Authority
CN
China
Prior art keywords
convolution
feature
group
image
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810884844.5A
Other languages
Chinese (zh)
Other versions
CN109145970B (en
Inventor
高鹏
李鸿升
王晓刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201810884844.5A priority Critical patent/CN109145970B/en
Publication of CN109145970A publication Critical patent/CN109145970A/en
Application granted granted Critical
Publication of CN109145970B publication Critical patent/CN109145970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

This disclosure relates to a kind of question and answer treating method and apparatus, electronic equipment and storage medium based on image, which comprises the problem of obtaining the characteristics of image of target image, and obtaining target problem feature;Dynamic convolution kernel is determined based on described problem feature, and convolution operation is executed based on dynamic convolution verification described image feature and obtains fusion feature;The answer that the target problem is directed in the target image is determined based on the fusion feature.The precision of prediction of answer can be improved in the embodiment of the present disclosure.

Description

Question and answer treating method and apparatus, electronic equipment and storage medium based on image
Technical field
The embodiment of the present disclosure is related to computer vision field, in particular to a kind of question and answer processing method and dress based on image It sets, electronic equipment and storage medium.
Background technique
Image recognition technology can be applied in every field, such as can come in conjunction with computer vision and language analysis real Existing image labeling, language search figure and vision question and answer etc..Wherein vision question and answer refer in a given picture and associated In the case where problem, what vision question answering system can be true provides answer text.
Processing vision question-answering task existing research in, it is necessary first to bilinearity pond method obtain this characteristics of image with The fusion feature of problem characteristic finally predicts final answer based on the fusion feature.But current Fusion Features process exists The relatively rear stage of vision Question-Answering Model executes, and cannot effectively utilize picture feature and problem characteristic.
Summary of the invention
The embodiment of the present disclosure proposes one kind and can effectively utilize characteristics of image and problem characteristic and improve accuracy of identification Question and answer treating method and apparatus, electronic equipment and storage medium based on image.
According to the first aspect of the embodiments of the present disclosure, a kind of question and answer processing method based on image is provided comprising:
The problem of obtaining the characteristics of image of target image, and obtaining target problem feature, wherein the target problem is Aiming at the problem that target image;
Dynamic convolution kernel is determined based on described problem feature, and is executed based on dynamic convolution verification described image feature Convolution operation obtains fusion feature;
The answer that the target problem is directed in the target image is determined based on the fusion feature.
In the embodiments of the present disclosure, described that convolution operation acquisition is executed based on dynamic convolution verification described image feature Fusion feature includes:
The dynamic convolution kernel is separately input into N number of convolution mould group, the composition wherein each convolution mould group is successively linked in sequence Linear connection structure, wherein N is the integer greater than 1;
The dynamic convolution kernel, which is based on, by first convolution mould group in the linear connection structure executes the figure As the convolution operation of feature, the first output result is obtained;
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1 And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group, wherein by each described Convolution mould group executes convolution operation
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed at least One convolution unit group, this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel and is assigned Default convolution kernel;
The feature subdata of dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution to input Execute convolution operation.
In the embodiments of the present disclosure, the dynamic convolution kernel by each convolution unit group based on distribution or default convolution The feature subdata of verification input executes convolution operation
The is selected from the fisrt feature data that obtain of convolution unit group for executing convolution operation using dynamic convolution kernel A part of data, the fisrt feature data are the number for executing the convolution unit of convolution operation using the dynamic convolution kernel and generating According to;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group, wherein by each described Convolution mould group executes convolution operation
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to Each convolution unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
In the embodiments of the present disclosure, the feature of the dynamic convolution verification input by each convolution unit group based on distribution Subdata executes convolution operation
Obtain the characteristic of each convolution unit group output;
Characteristic based on the output of each convolution unit group obtains the output result of corresponding convolution mould group.
In the embodiments of the present disclosure, described determined in the target image based on the fusion feature is asked for the target The answer of topic includes:
Analysis identification is carried out to the fusion feature using preset disaggregated model, obtains answering for the target problem Case.
In the embodiments of the present disclosure, described determined in the target image based on the fusion feature is asked for the target The answer of topic includes:
Obtain attention weight matrix corresponding with the fusion feature;
Character pair weighted sum based on the attention weight matrix to the fusion feature;
Analysis identification is carried out to the fusion feature after weighted sum using preset disaggregated model, obtains and is directed to the target The answer of problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
1 in the embodiments of the present disclosure, described to determine that dynamic convolution kernel includes: based on described problem feature
Described problem feature is handled using full Connection Neural Network to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the characteristics of image for obtaining target image, and obtain and be directed to the target image Target problem the problem of feature include:
It is asked using the characteristics of image of target image described in the first model extraction, and using target described in the second model extraction The problems in topic feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following Ring neural network model.
In the embodiments of the present disclosure, the method also includes:
Target image is obtained from the image data of storage, and/or receives the target image of transmission;
The target problem and/or the target problem in recognition target image for receiving input.
According to the second aspect of an embodiment of the present disclosure, a kind of question and answer processing unit based on image is provided comprising:
The problem of feature obtains module, is configured to obtain the characteristics of image of target image, and obtains target problem is special Sign, wherein the target problem is aiming at the problem that target image;
Fusion Features module is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic volume Product verification described image feature executes convolution operation and obtains fusion feature;
Determining module is configured to the fusion feature and determines in the target image for the target problem Answer.
In the embodiments of the present disclosure, the Fusion Features module includes N number of convolution mould group, and each convolution mould group successively sequentially connects Composition linear connection structure is connect, wherein N is the integer greater than 1;And
The Fusion Features module is additionally configured to the dynamic convolution kernel being separately input into N number of convolution mould group, and passes through First convolution mould group in the linear connection structure executes the convolution of described image feature based on the dynamic convolution kernel Operation, obtains the first output result;And
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1 And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
Also, the Fusion Features module be further configured to for the input data for being input to each convolution mould group to be divided into Few one group of feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and by dynamic convolution Core distributes at least one convolution unit group, with by each convolution unit group based on distribution dynamic convolution kernel or default convolution The feature subdata of verification input executes convolution operation;
Wherein, each convolution unit group being assigned other than at least one convolution unit group of dynamic convolution kernel is assigned default Convolution kernel;
Also, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data includes The output result of described image feature or the previous convolution mould group including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to execute convolution operation from using dynamic convolution kernel The fisrt feature data that obtain of convolution unit group in select first part's data, the fisrt feature data be utilize described in Dynamic convolution kernel executes the data that the convolution unit of convolution operation generates;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
The Fusion Features module is further configured to the input data for being input to each convolution mould group being divided at least one Group feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is divided Each convolution unit group is not distributed to, with the feature subdata of the dynamic convolution verification input by each convolution unit group based on distribution Execute convolution operation;
Wherein, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data The output result of previous convolution mould group including described image feature or including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to the characteristic exported based on each convolution unit group According to the output result for obtaining corresponding convolution mould group.
In the embodiments of the present disclosure, the determining module is additionally configured to using preset disaggregated model to the fusion feature Analysis identification is carried out, the answer for being directed to the target problem is obtained.
In the embodiments of the present disclosure, the determining module is additionally configured to obtain attention power corresponding with the fusion feature Weight matrix, and the character pair weighted sum based on the attention weight matrix to the fusion feature, and utilize default Disaggregated model analysis identification carried out to the fusion feature after weighted sum, obtain the answer for being directed to the target problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to using full Connection Neural Network to described problem Feature is handled to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the feature obtains module and is additionally configured to utilize target image described in the first model extraction Characteristics of image, and utilize the problems in target problem described in the second model extraction feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following Ring neural network model.
In the embodiments of the present disclosure, described device further include:
Input module is configured to obtain target image from the image data of storage, and/or receives the target figure of transmission Picture;And it is configured to receive the target problem of input and/or the target problem in recognition target image.
According to the third aspect of an embodiment of the present disclosure, a kind of electronic equipment is provided comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: execute method described in any one of above-described embodiment.
According to a fourth aspect of embodiments of the present disclosure, a kind of computer readable storage medium is provided, meter is stored thereon with Calculation machine program instruction, the computer program instructions realize side described in any one of above-described embodiment when being executed by processor Method.
In the embodiments of the present disclosure, after obtaining characteristics of image and problem characteristic, then dynamic is determined according to problem characteristic Convolution kernel executes the convolution operation of characteristics of image, and due to having merged problem characteristic in dynamic convolution kernel, what is generated melts The close blending image of energy in feature (visual signature) and problem information are closed, the accuracy of identification for answer is improved.
It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than Limit the disclosure.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.
Fig. 1 shows a kind of flow chart of question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 2 shows the block diagrams according to the convolution mould group of the embodiment of the present disclosure;
Fig. 3 shows the flow chart of step S300 in the question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 4, which shows successively to be linked in sequence according to each convolution mould group of the embodiment of the present disclosure, constitutes the frame of linear connection structure Figure;
Fig. 5 shows the flow chart that convolution operation is executed by each convolution mould group according to the embodiment of the present disclosure;
Fig. 6 shows determining in the target image for the mesh based on the fusion feature according to the embodiment of the present disclosure The flow chart of the answer of mark problem;
Fig. 7 shows the process schematic of the question and answer processing method based on image according to the embodiment of the present disclosure;
Fig. 8 shows a kind of block diagram of question and answer processing unit based on image according to the embodiment of the present disclosure;
Fig. 9 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure;
Figure 10 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A, B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
It is appreciated that above-mentioned each embodiment of the method that the disclosure refers to, without prejudice to principle logic, To engage one another while the embodiment to be formed after combining, as space is limited, the disclosure is repeated no more.
In addition, the disclosure additionally provides question and answer processing unit based on image, electronic equipment, computer-readable storage medium Matter, program, the above-mentioned any question and answer processing method based on image that can be used to realize that the disclosure provides, corresponding technical solution With description and referring to the corresponding record of method part, repeat no more.
The embodiment of the present disclosure provides a kind of question and answer processing method based on image, and this method can be applied to be regarded in computer Feel field, and it is used for the problem of handling the question and answer based on image.The embodiment of the present disclosure can be determined according to feature the problem of acquisition The dynamic convolution kernel for executing convolution operation then can be with due to including characteristic information related with problem in the dynamic convolution kernel It is effective to merge problem characteristic and characteristics of image when executing convolution operation, so that the visual signature obtained then can be improved and answer The accuracy of identification of case.
Fig. 1 shows a kind of flow chart of question and answer processing method based on image according to the embodiment of the present disclosure, wherein such as Fig. 1 Shown, being somebody's turn to do the question and answer processing method based on image may include:
S100: the problem of obtaining the characteristics of image of target image, and obtaining target problem feature, wherein the target is asked It is entitled aiming at the problem that target image;
S200: dynamic convolution kernel is determined based on described problem feature, and special based on dynamic convolution verification described image Sign executes convolution operation and obtains fusion feature;
S300: the answer that the target problem is directed in the target image is determined based on the fusion feature.
The question and answer processing method based on image in the embodiment of the present disclosure can obtain figure according to the image and problem of acquisition Answer as in about the problem.The problem can be about the quantity of certain an object, color, state or other letters in image The problem of breath, such as in the image for including bicycle, the problem can be include several bicycles or bicycle why face Color etc..The answer in image about problem can be identified by the method that the embodiment of the present disclosure provides.Above-mentioned example is only this The citing of the application of open embodiment, not as the limitation of the embodiment of the present disclosure.
The embodiment of the present disclosure is described in detail below.Target image available first and be directed to the target image Target problem.Wherein target image includes at least a picture, and target problem includes at least a problem, i.e. the disclosure is implemented The quantity of image and problem can be not limited in example, wherein can be an image corresponds to multiple problems or multiple figures As a corresponding problem, or it is also possible to other forms.For above situation, the embodiment of the present disclosure is for problem characteristic and figure As the processing mode of feature is essentially identical.
Wherein, the mode for target image being obtained in the embodiment of the present disclosure may include: to obtain from the image data of storage Target image, and/or receive the target image of transmission.That is, the target image in the embodiment of the present disclosure can be based on storage The image selected in image data, wherein the selection information of user's input can be received by input module, and according to selection Information determines the target image of selection, and input module may include in touch screen, Trackpad, keyboard, audio input device etc. At least one, input module also may include other devices to execute the input and reception of selection information in other embodiments. In addition, in the embodiment of the present disclosure target image can also be obtained by communicating with other electronic equipments, such as pass through wire communication Module and/or wireless communication module receive the target image of transmission, wherein wireless communication module can by bluetooth, wifi, The modes such as NFC realize the transmission of data, and wireless communication module can be by optical fiber, coaxial cable or other wireline cables come real The transmission of existing data.
In addition, the mode for obtaining target problem may include: the target problem for receiving input in the embodiment of the present disclosure, and/ Or the target problem in recognition target image.Likewise, the target problem in the embodiment of the present disclosure can be based on input module The information of input is also possible to based on wire communication module and/or the received information of wireless communication module.In addition, target is asked Topic also may include in the target image or target problem may be graphic form, can pass through Text region at this time Mode identifies the target problem in picture.
After obtaining target image and target problem, then the characteristics of image in target image can be obtained by step S100 And the problems in target problem feature.Wherein characteristics of image and problem characteristic can indicate respectively according to the form of matrix, scheme As feature may include the parameters such as the gray value of each pixel of target image, problem characteristic may include the language in target problem Information etc..
Wherein, in the embodiment of the present disclosure, the characteristics of image for obtaining target image, and obtain and be directed to the target figure The problem of target problem of picture, feature may include: to utilize the characteristics of image of target image described in the first model extraction, Yi Jili The problems in the target problem described in the second model extraction feature.Wherein the first model may include depth convolutional network model, Second model may include Recognition with Recurrent Neural Network model, it can utilize the spy in depth convolutional network model extraction target image Reference breath, obtains characteristics of image, and can use the language message in Recognition with Recurrent Neural Network model extraction target problem, obtains Problem characteristic.The first model and the second model in the embodiment of the present disclosure are also possible to other kinds of machine learning network mould Type.
In addition, the figure in the first preset algorithm recognition target image can also be passed through in the other embodiments of the disclosure As feature, such as first preset algorithm may include the algorithm based on wavelet moment and algorithm based on fractal characteristic etc., as long as The method that can recognize that the characteristics of image in target image can be applied to the embodiment of the present disclosure.It simultaneously can also be by the Two preset algorithms obtain the problems in target problem feature, wherein the second preset algorithm may include Text region algorithm, text Segmentation algorithm etc..
After obtaining characteristics of image and problem characteristic, step S200 can be executed, it can determine based on problem characteristic Execute the dynamic convolution kernel of convolution operation.Convolution kernel is when given input picture carries out image procossing, in the output image often One pixel is the weighted average of pixel in a zonule in input picture, and wherein weight is defined by a function, this letter Number is known as convolution kernel.The embodiment of the present disclosure can determine the relevant parameter of convolution kernel according to problem characteristic, eventually for formed with The relevant dynamic convolution kernel of problem characteristic, thus when executing the convolution operation of characteristics of image, it can be by problem characteristic and image Feature is effectively merged, so that obtained fusion feature retains the spatial information of target image, while closely having been merged and having been asked Inscribe information.
In the embodiments of the present disclosure, can use full Connection Neural Network described problem feature is handled to obtain it is described Dynamic convolution kernel.Problem characteristic can be handled by way of connecting entirely, obtain dynamic convolution kernel.Pass through the second mould The problem of type obtains feature can be one-dimensional matrix form, and the dynamic convolution kernel obtained by way of connecting entirely can be multidimensional Matrix form, such as two dimension or three-dimensional.After determining dynamic convolution kernel, then it is special image can be executed according to the dynamic convolution sum The convolution operation of sign, to obtain the fusion feature for having merged problem information and image information.By being carried out to the fusion feature Classification learning can then obtain the answer information that target problem is directed in target image.
Wherein it is possible to by fusion feature be input to it is trained after classifier or categorization module in, and using classification mould The corresponding disaggregated model of block carries out classification learning to fusion feature, finally obtains the answer of target problem.Wherein disaggregated model can To include deep learning neural network model, wherein the model can be the classification with high accuracy of identification completed by training Model.
Based on above-mentioned, it can complete for the identification of the answer of target problem in target image, wherein executing convolution When operation, the convolution kernel for executing convolution operation can be obtained according to problem characteristic, so that problem characteristic and image are special Sign can be effectively fused together, rather than the fusion of feature is executed by bilinear mode, and accuracy of identification can be improved, Improve the accuracy of prediction.
In order to clearly demonstrate the embodiment of the present disclosure, the embodiment of the present disclosure is described in detail separately below.
As described above, after determining dynamic convolution kernel, it can execute the volume of characteristics of image by the dynamic convolution kernel Product operates, and dynamic convolution kernel and characteristics of image can be input in convolution mould group in the embodiment of the present disclosure, pass through convolution mould group Above-mentioned convolution operation can then be executed.Wherein, the embodiment of the present disclosure may include at least one convolution mould group, include a volume When product module group, then fusion feature is obtained by the convolution operation that the convolution mould group executes characteristics of image based on dynamic convolution kernel.And When including multiple convolution mould groups, multiple convolution mould group can be sequentially sequentially connected, and constitute linear connection structure, thus Successively convolution operation can be executed according to dynamic convolution kernel according to each convolution operation on the linear connection structure to be merged Feature.
Fig. 2 shows the block diagrams according to the convolution mould group of the embodiment of the present disclosure, wherein the convolution mould group in the embodiment of the present disclosure It may include at least one convolution unit group, correspondingly convolution operation can be executed by least one convolution unit group.Fig. 2 It shows including 8 convolution unit groups, may include any number of convolution unit group, this public affairs in the other embodiments of the disclosure It opens and this is not limited.In addition, may include at least one convolution unit (A1, A2, A3 ...) in each convolution unit group, figure 2 show three convolution units, but not as the limitation of the embodiment of the present disclosure, and the quantity of the convolution unit in other embodiments can be with It sets according to demand.Each convolution unit in each convolution unit group can successively be linked in sequence, and be input to convolution mould group Input data can be allocated to multiple groups feature subdata, first convolution unit A1 of each convolution unit group can be docked The feature subdatas of receipts executes convolution operation, and convolution results can be exported to second convolution unit A2, subsequent each Convolution unit then can further execute convolution operation according to the convolution results of previous convolution unit, eventually by the last one volume Product unit obtains fisrt feature data.Here the input data for being input to convolution mould group may include that characteristics of image is either previous The output result of convolution mould group.The fisrt feature data of acquisition are determined for final fusion feature, are also possible to export To the input data of next convolution mould group.Wherein, when the embodiment of the present disclosure only includes a convolution mould group, the convolution mould group Input data is characteristics of image, and the fisrt feature data of output are fusion feature.When including multiple convolution mould groups, convolution The input data of mould group can be for output that characteristics of image or previous convolution mould group export as a result, and each convolution mould group output Fisrt feature data are determined for fusion feature.
In the embodiment of the present disclosure, can at least it be held in the presence of a convolution unit group using dynamic convolution kernel in each convolution mould group Row convolution operation, such as convolution operation can be executed using dynamic convolution kernel with only one convolution unit group, or can also be complete Portion's convolution unit group all utilizes the dynamic convolution kernel to execute convolution operation, or may be that a part of convolution unit group utilization is dynamic State convolution kernel executes convolution operation.Such as in Fig. 2, first convolution unit group is then to execute convolution using dynamic convolution kernel to grasp Make, the convolution unit group other than the convolution unit group for executing convolution operation using dynamic convolution kernel can use default convolution Core executes convolution operation, which can be corresponding convolution unit group and calculate determining convolution kernel through backpropagation, It can be such as static volume product core, or be also possible to preconfigured convolution kernel.It may be implemented in convolution mould group based on the configuration At least part convolution unit group can use dynamic convolution kernel and execute convolution operation, so as to problem of implementation feature and image The fusion of feature, while can also solve that operand brought by problem characteristic is big to ask as introducing in dynamic convolution kernel Topic, improves arithmetic speed.It further, can be down in the convolution unit group for executing convolution operation using dynamic convolution kernel A rare convolution unit executes convolution operation using the dynamic convolution kernel, can be further reduced operand in this way and improve operation Precision.
In addition, as described above, also may include multiple convolution mould groups in the embodiment of the present disclosure, multiple convolution mould group can be with It is successively linked in sequence and forms linear structure, and each convolution mould group can be passed sequentially through and execute convolution behaviour using dynamic convolution kernel Make.Wherein the input data of first volume product module group can be characteristics of image, can use dynamic convolution kernel and execute characteristics of image Convolution operation, obtain the first output as a result, then second convolution mould group can then be executed according to dynamic convolution kernel it is first defeated The convolution operation of result out, and so on, fusion feature is exported by the last one convolution mould group.Fig. 3 is shown according to the disclosure The flow chart of the question and answer process method step S300 based on image of embodiment, wherein step S300 may include:
S301: being separately input into N number of convolution mould group for the dynamic convolution kernel, wherein each convolution mould group is successively linked in sequence Linear connection structure is constituted, wherein N is the integer greater than 1;
S302: the dynamic convolution kernel is based on by first convolution mould group in the linear connection structure and executes institute The convolution operation for stating characteristics of image obtains the first output result;
S303: the dynamic convolution kernel is based on by i-th of convolution mould group in the linear connection structure and executes the The convolution operation of the output result of i-1 convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is Integer greater than 1 and less than or equal to N.
Fig. 4, which shows successively to be linked in sequence according to each convolution mould group of the embodiment of the present disclosure, constitutes the frame of linear connection structure Figure.Wherein the linear connection structure may include the convolution mould group (QGHC mould group) of multiple sequential connections, such as may include 3 Convolution mould group B, C and D also may include the convolution mould group of other quantity in the other embodiments of the disclosure, such as may be 1 convolution mould group or more than one convolution mould group.
Wherein it is possible to the dynamic convolution kernel that step S200 is obtained is input in each convolution mould group B, C, D respectively, to pass through Convolution mould group executes corresponding convolution operation according to the dynamic convolution kernel.Wherein convolution mould group B can receive obtains in step S100 The characteristics of image taken, and utilize the convolution operation of the dynamic convolution kernel of input execution characteristics of image, wherein it can in convolution mould group B To utilize dynamic convolution kernel to execute the convolution operation of characteristics of image, remaining convolution unit group using at least one set of convolution unit group Convolution operation is executed using default convolution kernel, it is unrelated with problem characteristic to preset convolution kernel.Then convolution mould group C can use dynamic The convolution operation that convolution kernel executes the first output result obtain the second output as a result, can use in same convolution mould group C to Lack one group of convolution unit group to utilize dynamic convolution kernel to execute the convolution operation of the first output result, remaining convolution unit group utilizes Default convolution kernel executes convolution operation, and it is unrelated with problem characteristic to preset convolution kernel.Last convolution mould group D can use dynamic convolution The convolution operation that core executes the second output result obtains fusion feature, can use at least one set of volume in same convolution mould group D Product unit group utilizes dynamic convolution kernel to execute the convolution operation of the second output result, remaining convolution unit group utilizes default convolution Core executes convolution operation, and it is unrelated with problem characteristic to preset convolution kernel.
It, can sufficient blending image since the embodiment of the present disclosure can be operated by the multiple convolution of multiple convolution mould groups And problem characteristic, improve the accuracy of default problem answers.Simultaneously each convolution mould group can part convolution unit using dynamic State convolution kernel executes convolution operation, reduces operand and improves arithmetic speed.
As described above, each convolution mould group in the embodiment of the present disclosure includes at least one convolution unit group, Fig. 5 is shown According to the flow chart for executing convolution operation in the embodiment of the present disclosure by each convolution mould group, wherein executing volume by each convolution mould group Product operates
S3001: the input data for being input to convolution mould group is divided at least one set of feature subdata, the feature subnumber According to group number it is corresponding with the group number of the convolution unit group, and the input data include described image feature or including should The output result of the previous convolution mould group of convolution mould group;
S3002: each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed At least one convolution unit group is given, this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel It is assigned default convolution kernel;
S3003: the feature of dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution to input Subdata executes convolution operation.
I.e. each convolution mould group can will be inputted when receiving input data according to the group number of the convolution unit group in it Data are divided into the feature subdata of corresponding group number, can execute dimension-reduction treatment to input data by this way, so as into One step reduces data volume and facilitates operation.
At least there is a convolution unit group in convolution mould group and executes convolution operation, and the convolution list using dynamic convolution kernel Also at least there is a convolution unit in tuple and execute convolution operation, remaining convolution unit or convolution using the dynamic convolution kernel Unit group executes convolution operation using default convolution kernel, finally obtains convolution results.
In addition, in order to reinforce the fusion between problem characteristic and characteristics of image convolution can be executed in each convolution unit group When operation, the mixing of corresponding result is executed by feature mixed cell.Fig. 5 is shown according to step S3003 in the embodiment of the present disclosure Flow chart, wherein can also include:
S30031: it is selected from the fisrt feature data that the convolution unit group for executing convolution operation using dynamic convolution kernel obtains Ze Chu first part data, the fisrt feature data are that the convolution unit life of convolution operation is executed using the dynamic convolution kernel At data;
S30032: the of the second feature data that the convolution unit group that convolution operation is executed using default convolution kernel is obtained Two partial datas are substituted for first part's data, obtain third feature data;
S30033: the output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
As shown in Figure 2, between second convolution unit A2 in each convolution unit group and third convolution unit A3 It can also include feature mixed cell A4, wherein second convolution unit A2 in first convolution unit group exports result and be Fisrt feature data, second convolution unit group and third unit group output result are second feature data, are needed exist for It is bright, the result that the convolution unit group that convolution operation is executed using dynamic convolution kernel is obtained can be united in the embodiment of the present disclosure Referred to as fisrt feature data, i.e. fisrt feature data are that the convolution unit generation of convolution operation is executed using the dynamic convolution kernel Data.And the result that the convolution unit group for executing convolution operation using default convolution kernel obtains is referred to as second feature number According to being only intended to area here wherein each convolution unit fisrt feature data obtained and second feature data can be different Transfer state convolution kernel and default convolution kernel execute the result of convolution operation.Wherein, in order to improve the fusion of problem and characteristics of image Degree, the fisrt feature data that this feature mixed cell can be obtained from the convolution unit A2 in the 1st convolution unit group In select first part's data, and first part's data are replaced to by 2 convolution unit groups and third convolution unit A part of the second feature data of group A3 output forms third feature data, so that remaining utilizes default convolution kernel Executing in the convolution unit group of convolution operation also may include the relevant feature of problem information, i.e., can also be in third feature data Including the relevant feature of problem information.Then fisrt feature data can be executed according to every group of third convolution unit A3 respectively With the convolution operation of third feature data, fusion feature is obtained.
Wherein, in the embodiments of the present disclosure, the number of the convolution unit in the convolution unit group in each convolution mould group can be with Identical to can also be different, when executing above-mentioned steps S30032-S30033, the convolution unit that can use same level is executed Step is stated, a convolution unit is included at least in the same level using dynamic convolution kernel and executes convolution operation, so as to incite somebody to action Partial data including problem characteristic replaces the characteristic exported to remaining convolution unit.In the other embodiments of the disclosure In, it is also possible to execute above-mentioned steps using the convolution unit of different levels, likewise, in the convolution unit of the different levels extremely It less include a convolution unit using dynamic convolution kernel execution convolution operation, so as to include the partial data of problem characteristic Replace the characteristic exported to remaining convolution unit.And after replacing partial data, it can use subsequent convolution unit Convolution operation is continued to execute, the output result of corresponding convolution mould group is finally obtained.
Through the above configuration, the data for having merged problem characteristic can be substituted into remaining and execute volume using static convolution kernel In the convolution unit group of product operation, in the case where not needing to utilize the convolution operation of dynamic convolution kernel simultaneously, very big efficiency Arithmetic speed is improved, while effectively realizing Fusion Features, improves precision of prediction.
In addition, in the embodiments of the present disclosure, each convolution unit group in convolution mould group can be assigned dynamic convolution Core, i.e., each convolution unit group in convolution mould group may be by dynamic convolution kernel and execute the corresponding feature subdata inputted Convolution operation.Also, at least one convolution unit executes convolution operation using the dynamic convolution kernel in each convolution unit group.This When, executing convolution operation by each convolution mould group may include:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group of the feature subdata Number is corresponding with the group number of the convolution unit group, and the input data includes described image feature or including the convolution mould The output result of the previous convolution mould group of group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to Each convolution unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
As described above, each convolution unit group in each convolution mould group of the embodiment of the present disclosure can be assigned dynamic convolution Core, i.e., the characteristic that each convolution unit group obtains after the convolution operation using dynamic convolution kernel execution character pair subdata In can include problem information and image information, the obtained output knot of characteristic exported eventually by each convolution unit group Fruit can adequately merge problem characteristic and characteristics of image.Also, the dynamic convolution kernel by each convolution unit group based on distribution Executing convolution operation to the feature subdata of input may include obtaining the characteristic of each convolution unit group output, and be based on The characteristic of each convolution unit group output obtains the output result of corresponding convolution mould group.Figure can have been merged in the output result As information and problem information, fusion feature may finally be obtained by the output result that the last one convolution mould group obtains.
Wherein the output result of the corresponding convolution mould group of characteristic acquisition based on the output of each convolution unit group may include The characteristic connection combination of each convolution unit group output is formed into output as a result, also may include: by one of convolution list A part of data in the characteristic of tuple output replace the characteristic exported to remaining convolution unit group, thus using replacing The combination of characteristic after changing forms output as a result, being melted eventually by the output result that the last one convolution mould group obtains Close feature.By the configuration can significantly more efficient fusion problem information and image information, improve the precision of prediction of answer.
After obtaining fusion feature, then it is special to the fusion to can use the corresponding preset disaggregated model of categorization module Sign carries out analysis identification, obtains the answer for being directed to the target problem.Wherein the disaggregated model may include deep learning network Model can carry out classification based training to the fusion feature of input, final to obtain the answer information for being directed to target problem.This field Technical staff can construct disaggregated model according to different settings or demand, it can realize classification using prior art means The functional structure of model, to this embodiment of the present disclosure without being described in detail.
In addition, can also be right according to the weight determined for fusion feature before carrying out classification learning using fusion feature Fusion feature carries out operation, to further increase precision of prediction.
Wherein, Fig. 6, which is shown, determines needle in the target image based on the fusion feature according in the embodiment of the present disclosure To the flow chart of the answer of the target problem, i.e., the step S300 in the embodiment of the present disclosure may include:
S304: attention weight matrix corresponding with the fusion feature is obtained;
S305: the character pair weighted sum based on the attention weight matrix to the fusion feature;
S306: analysis identification is carried out to the fusion feature after weighted sum using preset disaggregated model, obtains and is directed to institute State the answer of target problem.
In the embodiment of the present disclosure, the parameters in attention weight matrix indicate the weight of each parameter in fusion feature Value, wherein can successively carry out the fusion feature that convolution mould group exports to be added behaviour with problem characteristic on different images region Make, then multiplies 1 convolution kernel and softmax operation using 1, addition results are mapped as attention weight matrix.That is the disclosure Attention weight matrix in embodiment can be obtained by fusion feature with the corresponding sum operation of problem characteristic.Wherein this is asked Topic feature can be based on obtaining after target problem linearization process, or pass through what Recognition with Recurrent Neural Network obtained, this Disclosure is to this without limiting.
Can be executed after attention weight matrix has been determined by the weight of attention weight matrix and fusion feature into Row weighted sum, so as to the fusion feature optimized.Then it can use disaggregated model to the fusion feature of the optimization Analysis identification is carried out, the answer for being directed to the target problem is obtained.Wherein, due to indicating to merge using attention weight matrix The weight of each parameter of feature carries out identification classification after optimizing according to the attention weight to fusion feature, can further mention High precision of prediction.
For the embodiment embodiment of the present disclosure being more complete and clear, illustrate below with reference to Fig. 7 to the embodiment of the present disclosure Illustrate, Fig. 7 shows the process schematic of the question and answer processing method based on image according to the embodiment of the present disclosure.
Wherein, target image and target problem are obtained first, and wherein target problem is " including several bicycles ".It is obtaining It can use the first model after target image and target problem and obtain the characteristics of image for corresponding to target image, and utilize second Model obtains the problem of corresponding to target problem feature.Then dynamic convolution kernel is obtained based on problem characteristic, and respectively moves this State convolution kernel is input to three convolution mould groups B, C, D.The convolution of characteristics of image is executed based on dynamic convolution kernel by convolution mould group B Operation obtains the first output as a result, obtaining using the convolution operation that convolution mould group B executes the first output result based on dynamic convolution kernel To the second output as a result, being merged using the convolution operation that convolution mould group C executes the second output result based on dynamic convolution kernel Feature.Wherein each convolution mould group may be incorporated into default convolution kernel to reduce operand.It then can be with base after obtaining fusion feature Problem feature after fusion feature and linear process gains attention power weight matrix, based on the attention weight matrix with merge The fusion feature that the weighted sum of feature is optimized then utilizes the taxology of the fusion feature of this row of disaggregated model optimization It practises, obtains the answer " 2 " for target problem.The above is only an exemplary illustration of the embodiment of the present disclosure, it is not intended as this The limitation of open embodiment.
In conclusion after obtaining characteristics of image and problem characteristic, then being determined according to problem characteristic in the embodiment of the present disclosure Dynamic convolution kernel generates to execute the convolution operation of characteristics of image due to having merged problem characteristic in dynamic convolution kernel Fusion feature (visual signature) in can close blending image and problem information, improve the identification of the answer for target problem Precision.In addition, the convolution mould group in the embodiment of the present disclosure can introduce the default convolution kernel unrelated with problem, can guarantee to scheme Under the premise of the close fusion of picture and problem, operand can also be reduced, operation time is saved.
It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function It can be determined with possible internal logic.
Fig. 8 shows a kind of block diagram of question and answer processing unit device based on image according to the embodiment of the present disclosure, such as Fig. 8 institute Show, the question and answer processing unit based on image may include:
The problem of feature obtains module 10, is configured to obtain the characteristics of image of target image, and obtains target problem Feature, wherein the target problem is aiming at the problem that target image;
Fusion Features module 20 is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic Convolution kernel executes convolution operation to described image feature and obtains fusion feature;
Determining module 30 is configured to the fusion feature and determines in the target image for the target problem Answer.
In the embodiments of the present disclosure, the Fusion Features module includes N number of convolution mould group, and each convolution mould group successively sequentially connects Composition linear connection structure is connect, wherein N is the integer greater than 1;And
The Fusion Features module is additionally configured to the dynamic convolution kernel being separately input into N number of convolution mould group, and passes through First convolution mould group in the linear connection structure executes the convolution of described image feature based on the dynamic convolution kernel Operation, obtains the first output result;And
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th The convolution operation of the output result of convolution mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1 And it is less than or equal to the integer of N.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
Also, the Fusion Features module be further configured to for the input data for being input to each convolution mould group to be divided into Few one group of feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and by dynamic convolution Core distributes at least one convolution unit group, with by each convolution unit group based on distribution dynamic convolution kernel or default convolution The feature subdata of verification input executes convolution operation;
Wherein, each convolution unit group being assigned other than at least one convolution unit group of dynamic convolution kernel is assigned default Convolution kernel;
Also, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data includes The output result of described image feature or the previous convolution mould group including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to execute convolution operation from using dynamic convolution kernel The fisrt feature data that obtain of convolution unit group in select first part's data, the fisrt feature data be utilize described in Dynamic convolution kernel executes the data that the convolution unit of convolution operation generates;
The second part for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained Data are substituted for first part's data, obtain third feature data;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
In the embodiments of the present disclosure, each convolution mould group includes at least one convolution unit group;
The Fusion Features module is further configured to the input data for being input to each convolution mould group being divided at least one Group feature subdata, and each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is divided Each convolution unit group is not distributed to, with the feature subdata of the dynamic convolution verification input by each convolution unit group based on distribution Execute convolution operation;
Wherein, the group number of the feature subdata is corresponding with the group number of the convolution unit group, and the input data The output result of previous convolution mould group including described image feature or including the convolution mould group.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to the characteristic exported based on each convolution unit group According to the output result for obtaining corresponding convolution mould group.
In the embodiments of the present disclosure, the determining module is additionally configured to using preset disaggregated model to the fusion feature Analysis identification is carried out, the answer for being directed to the target problem is obtained.
In the embodiments of the present disclosure, the determining module is additionally configured to obtain attention power corresponding with the fusion feature Weight matrix, and the character pair weighted sum based on the attention weight matrix to the fusion feature, and utilize default Disaggregated model analysis identification carried out to the fusion feature after weighted sum, obtain the answer for being directed to the target problem.
In the embodiments of the present disclosure, the preset disaggregated model includes deep learning neural network model.
In the embodiments of the present disclosure, the Fusion Features module is additionally configured to using full Connection Neural Network to described problem Feature is handled to obtain the dynamic convolution kernel.
In the embodiments of the present disclosure, the feature obtains module and is additionally configured to utilize target image described in the first model extraction Characteristics of image, and utilize the problems in target problem described in the second model extraction feature.
In the embodiments of the present disclosure, first model includes depth convolutional network model, and second model includes following Ring neural network model.
In the embodiments of the present disclosure, described device further include:
Input module is configured to obtain target image from the image data of storage, and/or receives the target figure of transmission Picture;And it is configured to receive the target problem of input and/or the target problem in recognition target image.
In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this In repeat no more.
The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter Calculation machine readable storage medium storing program for executing.
The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor;For storage processor executable instruction Memory;Wherein, the processor is configured to the above method.
Fig. 9 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.For example, electronic equipment 800 can To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for Body equipment, the terminals such as personal digital assistant.
Referring to Fig. 9, electronic equipment 800 may include following one or more components: processing component 802, memory 804, Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, And communication component 816.
The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with Facilitate the interaction between multimedia component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly Flash memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800 Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800 The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor, Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment. Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete The above method.
Figure 10 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment 1900 may be provided as a server.Referring to Fig.1 0, it further comprises one that electronic equipment 1900, which includes processing component 1922, A or multiple processors and memory resource represented by a memory 1932, can be by processing component 1922 for storing The instruction of execution, such as application program.The application program stored in memory 1932 may include one or more every One corresponds to the module of one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.
Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900 Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated (I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with Complete the above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (10)

1. a kind of question and answer processing method based on image characterized by comprising
Obtain target image characteristics of image, and obtain target problem the problem of feature, wherein the target problem be for The problem of target image;
Dynamic convolution kernel is determined based on described problem feature, and convolution is executed based on dynamic convolution verification described image feature Operation obtains fusion feature;
The answer that the target problem is directed in the target image is determined based on the fusion feature.
2. the method according to claim 1, wherein described check described image feature based on the dynamic convolution Executing convolution operation acquisition fusion feature includes:
The dynamic convolution kernel is separately input into N number of convolution mould group, wherein each convolution mould group, which is successively linked in sequence, constitutes straight line Type connection structure, wherein N is the integer greater than 1;
The dynamic convolution kernel, which is based on, by first convolution mould group in the linear connection structure executes described image spy The convolution operation of sign obtains the first output result;
The dynamic convolution kernel, which is based on, by i-th of convolution mould group in the linear connection structure executes (i-1)-th convolution The convolution operation of the output result of mould group, and the fusion feature is obtained by n-th convolution mould group, wherein i is greater than 1 and small In or equal to N integer.
3. according to the method described in claim 2, it is characterized in that, each convolution mould group includes at least one convolution unit Group, wherein executing convolution operation by each convolution mould group includes:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group number of the feature subdata with The group number of the convolution unit group is corresponding, and the input data includes described image feature or including the convolution mould group The output result of previous convolution mould group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is distributed at least one Convolution unit group, it is assigned default that this is assigned each convolution unit group other than at least one convolution unit group of dynamic convolution kernel Convolution kernel;
Dynamic convolution kernel or default convolution kernel by each convolution unit group based on distribution execute the feature subdata of input Convolution operation.
4. according to the method described in claim 3, it is characterized in that, described pass through each dynamic volume of the convolution unit group based on distribution Product core or default convolution kernel execute convolution operation to the feature subdata of input and include:
First is selected from the fisrt feature data that the convolution unit group for executing convolution operation using dynamic convolution kernel obtains Divided data, the fisrt feature data are the data for executing the convolution unit of convolution operation using the dynamic convolution kernel and generating;
The second part data for the second feature data that the convolution unit group for executing convolution operation using default convolution kernel is obtained First part's data are substituted for, third feature data are obtained;
The output result of corresponding convolution mould group is obtained based on the fisrt feature data and third feature data.
5. according to the method described in claim 2, it is characterized in that, each convolution mould group includes at least one convolution unit Group, wherein executing convolution operation by each convolution mould group includes:
The input data for being input to convolution mould group is divided at least one set of feature subdata, the group number of the feature subdata with The group number of the convolution unit group is corresponding, and the input data includes described image feature or including the convolution mould group The output result of previous convolution mould group;
Each group feature subdata is respectively correspondingly distributed into each convolution unit group, and dynamic convolution kernel is respectively allocated to each volume Product unit group;
The feature subdata of dynamic convolution verification input by each convolution unit group based on distribution executes convolution operation.
6. according to the method described in claim 5, it is characterized in that, described pass through each dynamic volume of the convolution unit group based on distribution The feature subdata of product verification input executes convolution operation and includes:
Obtain the characteristic of each convolution unit group output;
Characteristic based on the output of each convolution unit group obtains the output result of corresponding convolution mould group.
7. the method according to claim 1, wherein described determine the target image based on the fusion feature In for the answer of the target problem include:
Analysis identification is carried out to the fusion feature using preset disaggregated model, obtains the answer for being directed to the target problem.
8. a kind of question and answer processing unit based on image characterized by comprising
The problem of feature obtains module, is configured to obtain the characteristics of image of target image, and obtains target problem feature, Described in target problem be aiming at the problem that target image;
Fusion Features module is configured to described problem feature and determines dynamic convolution kernel, and is based on the dynamic convolution kernel Convolution operation is executed to described image feature and obtains fusion feature;
Determining module is configured to the fusion feature and determines answering for the target problem in the target image Case.
9. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require any one of 1 to 7 described in method.
10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer Method described in any one of claim 1 to 7 is realized when program instruction is executed by processor.
CN201810884844.5A 2018-08-06 2018-08-06 Image-based question and answer processing method and device, electronic equipment and storage medium Active CN109145970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810884844.5A CN109145970B (en) 2018-08-06 2018-08-06 Image-based question and answer processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810884844.5A CN109145970B (en) 2018-08-06 2018-08-06 Image-based question and answer processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109145970A true CN109145970A (en) 2019-01-04
CN109145970B CN109145970B (en) 2021-01-12

Family

ID=64791740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810884844.5A Active CN109145970B (en) 2018-08-06 2018-08-06 Image-based question and answer processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109145970B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222770A (en) * 2019-06-10 2019-09-10 成都澳海川科技有限公司 A kind of vision answering method based on syntagmatic attention network
CN111488887A (en) * 2020-04-09 2020-08-04 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN112016577A (en) * 2019-05-30 2020-12-01 华为技术有限公司 Image processing method and device
WO2021110147A1 (en) * 2019-12-06 2021-06-10 阿里巴巴集团控股有限公司 Methods and apparatuses for image processing, image training and channel shuffling
CN113052756A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN114581746A (en) * 2022-03-04 2022-06-03 北京百度网讯科技有限公司 Object detection method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN105701507A (en) * 2016-01-13 2016-06-22 吉林大学 Image classification method based on dynamic random pooling convolution neural network
CN106649542A (en) * 2015-11-03 2017-05-10 百度(美国)有限责任公司 Systems and methods for visual question answering
CN107679582A (en) * 2017-10-20 2018-02-09 深圳市唯特视科技有限公司 A kind of method that visual question and answer are carried out based on multi-modal decomposition model
CN108170816A (en) * 2017-12-31 2018-06-15 厦门大学 A kind of intelligent vision Question-Answering Model based on deep neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN106649542A (en) * 2015-11-03 2017-05-10 百度(美国)有限责任公司 Systems and methods for visual question answering
EP3166049A1 (en) * 2015-11-03 2017-05-10 Baidu USA LLC Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering
CN105701507A (en) * 2016-01-13 2016-06-22 吉林大学 Image classification method based on dynamic random pooling convolution neural network
CN107679582A (en) * 2017-10-20 2018-02-09 深圳市唯特视科技有限公司 A kind of method that visual question and answer are carried out based on multi-modal decomposition model
CN108170816A (en) * 2017-12-31 2018-06-15 厦门大学 A kind of intelligent vision Question-Answering Model based on deep neural network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016577A (en) * 2019-05-30 2020-12-01 华为技术有限公司 Image processing method and device
CN110222770A (en) * 2019-06-10 2019-09-10 成都澳海川科技有限公司 A kind of vision answering method based on syntagmatic attention network
CN110222770B (en) * 2019-06-10 2023-06-02 成都澳海川科技有限公司 Visual question-answering method based on combined relationship attention network
WO2021110147A1 (en) * 2019-12-06 2021-06-10 阿里巴巴集团控股有限公司 Methods and apparatuses for image processing, image training and channel shuffling
CN113052756A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN111488887A (en) * 2020-04-09 2020-08-04 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN111488887B (en) * 2020-04-09 2023-04-18 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN114581746A (en) * 2022-03-04 2022-06-03 北京百度网讯科技有限公司 Object detection method, device, equipment and medium
CN114581746B (en) * 2022-03-04 2023-09-05 北京百度网讯科技有限公司 Object detection method, device, equipment and medium

Also Published As

Publication number Publication date
CN109145970B (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN109145970A (en) Question and answer treating method and apparatus, electronic equipment and storage medium based on image
CN110210535A (en) Neural network training method and device and image processing method and device
CN109800737A (en) Face recognition method and device, electronic equipment and storage medium
CN109800744A (en) Image clustering method and device, electronic equipment and storage medium
CN110378976A (en) Image processing method and device, electronic equipment and storage medium
CN108764069A (en) Biopsy method and device
CN109816611A (en) Video repairing method and device, electronic equipment and storage medium
CN109697734A (en) Position and orientation estimation method and device, electronic equipment and storage medium
CN109614613A (en) The descriptive statement localization method and device of image, electronic equipment and storage medium
CN110189249A (en) A kind of image processing method and device, electronic equipment and storage medium
CN109658352A (en) Optimization method and device, electronic equipment and the storage medium of image information
CN107944409A (en) video analysis method and device
CN110060262A (en) A kind of image partition method and device, electronic equipment and storage medium
CN109919300A (en) Neural network training method and device and image processing method and device
CN110298310A (en) Image processing method and device, electronic equipment and storage medium
CN109615006A (en) Character recognition method and device, electronic equipment and storage medium
CN109829863A (en) Image processing method and device, electronic equipment and storage medium
CN109902738A (en) Network module and distribution method and device, electronic equipment and storage medium
CN109165738A (en) Optimization method and device, electronic equipment and the storage medium of neural network model
CN109255784A (en) Image processing method and device, electronic equipment and storage medium
CN109858614A (en) Neural network training method and device, electronic equipment and storage medium
CN109978891A (en) Image processing method and device, electronic equipment and storage medium
CN109859144A (en) Image processing method and device, electronic equipment and storage medium
CN109711546A (en) Neural network training method and device, electronic equipment and storage medium
CN110458218A (en) Image classification method and device, sorter network training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant