CN114743018A - Image description generation method, device, equipment and medium - Google Patents

Image description generation method, device, equipment and medium Download PDF

Info

Publication number
CN114743018A
CN114743018A CN202210423256.8A CN202210423256A CN114743018A CN 114743018 A CN114743018 A CN 114743018A CN 202210423256 A CN202210423256 A CN 202210423256A CN 114743018 A CN114743018 A CN 114743018A
Authority
CN
China
Prior art keywords
image
preset
attention
detected
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210423256.8A
Other languages
Chinese (zh)
Other versions
CN114743018B (en
Inventor
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210423256.8A priority Critical patent/CN114743018B/en
Publication of CN114743018A publication Critical patent/CN114743018A/en
Application granted granted Critical
Publication of CN114743018B publication Critical patent/CN114743018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides an image description generation method, device, equipment and medium. The method comprises the following steps: inputting an image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected; inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected; inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder; and embedding the output value and the category into a decoder of the preset transformation model for processing to generate a description text of the image to be detected. The invention also relates to the technical field of block chains, and the region characteristics and the category embedding can be stored in a node of a block chain.

Description

Image description generation method, device, equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an image description generation method, device, equipment and medium.
Background
Image description (Image capture) is a comprehensive emerging discipline that merges computer vision technology, natural language processing technology, and machine learning technology. The purpose of the image description is to automatically generate a piece of descriptive text according to the picture content.
With the popularity of the Transformer model in the NLP field, many Transformer-based image description methods have been developed and demonstrated better performance than most conventional methods, which are improved in terms of the attention mechanism module at the input position coding and encoder portion to better adapt to the model with the image as input compared to the Transformer model for natural language processing.
However, the current method cannot integrate abstract features such as the relationship between image objects and the mapping relationship between the objects and corresponding labels into an attention mechanism, and the obtained description information is not accurate and rich enough.
Disclosure of Invention
In view of the above, the present invention provides an image description generation method, apparatus, device and medium, which aims to solve the technical problem in the prior art that the image generation description information is not accurate and rich enough.
In order to achieve the above object, the present invention provides an image description generating method, including:
inputting an image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected;
inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected;
inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder;
and embedding the output value and the category into a decoder which inputs the preset transformation model for processing to generate a description text of the image to be detected.
Preferably, the inputting the image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected includes:
according to a preset geometric relation calculation formula, carrying out frame recognition on a target contained in the image to be detected to obtain frames of the target and a target category of each frame;
and adjusting the size of the frame to a preset range, and outputting the regional characteristics of the image to be detected.
Preferably, the preset geometric relationship calculation formula includes:
Figure BDA0003608806630000021
xi (a, b) is the regional characteristic of the image to be measured, (x)a,ya) Is the coordinate of the center point of the a-th frame of the image to be detected, (x)b,yb) Is the center point coordinate of the b-th frame of the image to be detected, (w)a,ha) Is the width and height of the a-th frame, (w)b,hb) The width and height of the b-th frame.
Preferably, the inputting the region feature into a preset tag attention model for weight calculation and outputting the category embedding of the image to be detected includes:
matching the target category of the image to be detected with preset words of a preset multi-dimensional dictionary according to a preset matching formula to obtain a predicted word and a target label of the target category;
and coding and embedding the predictive words according to a preset first attention formula to obtain the category embedding of the image to be detected.
Preferably, the encoding and embedding the predictive word according to a preset first attention formula to obtain the category embedding of the image to be detected, the preset tag attention model includes a plurality of attention modules, each of the attention modules includes an independent scaling dot product attention function, and the method includes:
a1, inputting the predicted word into a matrix of a first attention module for weight calculation according to the preset first attention calculation formula and the scaling dot product attention function, and outputting a first weight value of the first attention module;
a2, inputting the first weight into a matrix of a second attention module for weight calculation, and outputting a second weight value of the second attention module;
and A3, repeating A1-A2 to obtain the weight values of all attention modules, splicing all weight values according to a series splicing function, and outputting the category embedding of the image to be detected.
Preferably, the encoder includes a plurality of identical encoding layers, each encoding layer includes a multi-head self-attention sublayer and a position feedforward sublayer, the multi-head self-attention sublayer includes a plurality of parallel head modules, the inputting the region feature into the encoder of the preset transformation model for processing, and outputting the output value of the encoder includes:
b1, inputting the geometric features of the region features into a matrix of a first parallel head module in a first coding layer for weight calculation according to a preset second attention calculation formula, and outputting a first result value of the first parallel head module;
b2, inputting the first result value into a matrix of a second parallel head module for weight calculation, and outputting a second result value of the second parallel head module;
b3, repeating B1-B2 to obtain the result values of all the parallel head modules, splicing all the result values according to a preset splicing formula, inputting the spliced result values into the position feedforward sub-layer block for nonlinear transformation, and inputting the transformed result values into a second coding layer of the coder;
b4, repeat B1-B3 output values for all encoded layers.
Preferably, the decoder includes a plurality of identical decoding layers, each decoding layer includes a multi-headed self-attention sublayer, a multi-headed cross-attention sublayer and a position forward sublayer, and the decoder that embeds the output value and the class into the preset transformation model performs processing to generate the description text of the image to be detected, including:
position embedding is carried out on an output value of the last coding layer to be used as input of the covering multi-head self-attention sublayer, and an input word vector is obtained;
embedding and inputting each output value, the input word vector and the category of the target into the multi-head cross attention sublayer for cross attention calculation to obtain a weight matrix;
and inputting the weight matrix into the position forward sublayer to calculate to generate a plurality of keywords, and splicing all the keywords to generate a description text of the image to be detected.
To achieve the above object, the present invention also provides an image description generating apparatus, comprising:
an identification module: the system comprises a target detection module, a target detection module and a target detection module, wherein the target detection module is used for inputting an image to be detected into the preset target detection module for identification and outputting the regional characteristics of the image to be detected;
a calculation module: the system is used for inputting the region characteristics into a preset label attention model for weight calculation and outputting the category embedding of the image to be detected;
an output module: the encoder is used for inputting the region characteristics into a preset transformation model for processing and outputting an output value of the encoder;
a generation module: and the decoder is used for embedding the output value and the category into the preset transformation model for processing to generate a description text of the image to be detected.
In order to achieve the above object, the present invention also provides an electronic device, including:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a program executable by the at least one processor to enable the at least one processor to perform the image description generation method of any one of claims 1 to 7.
To achieve the above object, the present invention further provides a computer readable medium storing an image description generation, which when executed by a processor, implements the steps of the image description generation method according to any one of claims 1 to 7.
The invention is composed of a preset target detection model, a preset label attention model and a Transformer model (a preset transformation model). And identifying and classifying the image to be detected according to a preset target detection model, establishing a geometric relationship and a position relationship between any two targets by combining an identification frame in the identification process, and outputting the category and the region characteristics of the target of the image to be detected.
According to the preset label attention model and the preset multi-dimensional dictionary, important class embedding is given to targets which frequently appear in the regional characteristics to serve as new labels, target values of encoders of the class embedding and preset transformation models are input into a decoding stage of the preset transformation model, and description information of the image to be detected is generated. The correctness of the relation between the targets in the image description information is improved, so that the description content is richer.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a preferred embodiment of the image description generation method of the present invention;
FIG. 2 is a block diagram of an image descriptor generating apparatus according to a preferred embodiment of the present invention;
FIG. 3 is a diagram of an electronic device according to a preferred embodiment of the present invention;
the objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The invention provides an image description generation method. Referring to fig. 1, a method flow diagram of an embodiment of the image description generation method of the present invention is shown. The method may be performed by an electronic device, which may be implemented by software and/or hardware. The image description generation method includes the following steps S10-S40:
step S10: and inputting the image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected.
The specific step S10 includes:
according to a preset geometric relation calculation formula, carrying out frame recognition on a target contained in the image to be detected to obtain frames of the target and a target category of each frame;
and adjusting the size of the frame to a preset range, and outputting the regional characteristics of the image to be detected.
In this embodiment, the preset target detection model includes, but is not limited to, a FasterR-CNN target detection model, which is a model integrating feature extraction, border regression, and classification. The method comprises the following steps of taking an MSCOCO data set as a classification database of a preset target detection model, and carrying out target detection on an input image to be detected by the preset target detection model, wherein the method specifically comprises the following steps: firstly, extracting image features from an image by a Convlayers layer of a preset target detection model, sharing the image features in an RPN layer and a full connection layer for calculation and frame identification according to a preset geometric relation calculation formula, and obtaining frames of the target and target categories of each frame, wherein the target categories comprise foreground information and background information (for example, the foreground information is a target object of the image).
In the process of frame identification, in order to better cover the characteristics of the image to be detected, the frame is coded, and four coordinate parameters are used
Figure BDA0003608806630000051
The position information of the anchor point and the real frame is represented, the four coordinate parameters respectively represent the coordinate of the central point, the width and the height of the target frame, the anchor point is continuously close to the real frame through linear regression learning of the four scalars, and therefore the frame of the target in the image to be detected is accurately obtained.
For convenience in describing text generation, a bilinear interpolation method is used for the image feature mapping region corresponding to the target, the size of the frame is adjusted to a preset range (for example, the size of the frame is adjusted to the edge pixel part of the target in the image), and finally the region feature of the input image is obtained. In order to take the geometric relationship and the position relationship between the targets into consideration in generating the description text, the geometric relationship and the position relationship can be established between any two targets based on a frame obtained in target detection, the geometric relationship represents the relationship between different targets on the image to be detected, and the position relationship represents the position representation of one target on the image to be detected.
In one embodiment, the predetermined geometric relationship calculation formula includes:
Figure BDA0003608806630000061
xi (a, b) is the regional characteristic of the image to be measured, (x)a,ya) Is the coordinate of the center point of the a-th frame of the image to be detected, (x)b,yb) Is the center point coordinate of the b-th frame of the image to be detected, (w)a,ha) Is the width and height of the a-th frame, (w)b,hb) The width and height of the b-th frame.
The geometric relationship and the position relationship between two targets of the image to be detected can be obtained through xi (a, b), and the geometric characteristics between the targets in different areas can be obtained through transformation
Figure BDA0003608806630000062
Geometric characteristics
Figure BDA0003608806630000063
The formula (2) includes:
Figure BDA0003608806630000064
wherein Emb is to embed geometric features between targets, and map a relation vector xi (a, b) between targets to a higher dimension, wGTo project the vector result to a scalar learnable vector.
Step S20: and inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected.
The specific step S20 includes:
matching the target category of the image to be detected with preset words of a preset multi-dimensional dictionary according to a preset matching formula to obtain a predicted word and a target label of the target category;
and coding and embedding the predictive words according to a preset first attention formula to obtain the category embedding of the image to be detected.
In an embodiment, the encoding and embedding the predictive word according to a preset first attention formula to obtain a category embedding of the image to be detected, the preset tag attention model includes a plurality of attention modules, each of the attention modules includes an independent scaled dot product attention function, and the method includes:
a1, inputting the predicted word into a matrix of a first attention module for weight calculation according to the preset first attention calculation formula and the scaling dot product attention function, and outputting a first weight value of the first attention module;
a2, inputting the first weight into a matrix of a second attention module for weight calculation, and outputting a second weight value of the second attention module;
and A3, repeating A1-A2 to obtain the weight values of all attention modules, splicing all weight values according to a series splicing function, and outputting the category embedding of the image to be detected.
In one embodiment, the preset matching formula includes:
Li=Emb(D(wj)),whenCi==D(wj)
wherein L isiIs the ith target label of the image to be detected, CiThe ith target class, whenC, of the image to be measuredi==D(wj) And correspondingly presetting a jth preset word of a multi-dimensional dictionary for the ith target category of the image to be detected.
In one embodiment, to give more weight to more important and more frequently occurring object classes, at LiOn the basis, the ranking of the ith target label in all the detection targets is calculated, and the method specifically comprises the following steps: ri=Li*Pr(Ci) Wherein Pr (C)i) And (4) the probability of the corresponding target class of the ith target label in all classes.
In one embodiment, the preset first attention calculation formula includes:
LAtt=σ(MHA(L,Ri,L))
wherein L isAttEmbedding the category of the image to be detected, wherein sigma is a sigmoid activation function, L is the region characteristic, and RiRanking the ith target label in all detected targets.
In one embodiment, the formula for calculating the scaled dot product attention function for each of the scaled dot product attention functions comprises
Figure BDA0003608806630000071
Figure BDA0003608806630000072
Qi=WqQ,Vi=WvV,Ki=WkK
Figure BDA0003608806630000073
Wherein d is a low-dimensional vector input by the image to be detected, Q, K and V are query, key and value matrixes of the preset label attention model respectively, Concat is a serial splicing function,
Figure BDA0003608806630000074
h attention modules, W, for said preset tag attention modelOAttention is a function of Attention for weight values.
In one embodiment, before the step S20, the method further includes:
acquiring a plurality of preset words prestored corresponding to different images in a preset corpus;
and calculating word frequency values of all preset words appearing in the preset corpus, and constructing the preset multi-dimensional dictionary according to the preset words with the word frequency values larger than a preset value.
The corpus of MSCOCO datasets is made up of a large number of image descriptions (explanatory text) corresponding to images, where each image may correspond to multiple image descriptions. The image descriptions are all composed of a plurality of preset words, the preset words with the occurrence frequency larger than a preset value (for example, the preset value is 5) in all the image descriptions are constructed into a preset multi-dimensional dictionary and used as a reference for generating a target label, and the preset words with the occurrence frequency more than 5 times are constructed into the preset multi-dimensional dictionary, so that the generated descriptions of the images can be more anthropomorphic.
Step S30: and inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder.
In step S30, the encoder includes a plurality of identical encoding layers, each encoding layer includes a multi-head self-attention sublayer and a position feedforward sublayer, the multi-head self-attention sublayer includes a plurality of parallel head modules, and the encoder includes:
b1, inputting the geometric features of the region features into a matrix of a first parallel head module in a first coding layer for weight calculation according to a preset second attention calculation formula, and outputting a first result value of the first parallel head module;
b2, inputting the first result value into a matrix of a second parallel head module for weight calculation, and outputting a second result value of the second parallel head module;
b3, repeating B1-B2 to obtain the result values of all the parallel head modules, splicing all the result values according to a preset splicing formula, inputting the spliced result values into the position feedforward sub-layer block for nonlinear transformation, and inputting the transformed result values into a second coding layer of the coder;
b4, repeat B1-B3 output values for all encoded layers.
In one embodiment, said inputting the geometric feature of the region feature into a matrix of a first parallel-header module for weight calculation in the first coding layer, and outputting a first result value of the first parallel-header module includes:
activating a scaling dot product attention function corresponding to the first parallel head module, and mapping the geometric characteristics of the region characteristics to a matrix of the first parallel head module for characteristic embedding;
and embedding the relation vector between the targets into different sub-modules of the multi-head self-attention sublayer for fusion by adjusting the weight parameters, and outputting a first result value of the first parallel head module.
In one embodiment, the preset second attention calculation formula includes:
hi(Q,K,V,η)=Attention(Q,K,V,η)=softmax(ηi)Vi,i∈[1,N],
wherein eta is the geometric feature to be fused into the image to be measured, hiFor the ith parallel head module of the multi-head self-Attention sublayer, Attention is the Attention function, and Q, K and V are query, key of the multi-head self-Attention sublayerValue matrix.
Each hiThe calculation formula comprises:
hi(Q,K,V,η)=Attention(Q,K,V,η)=softmax(ηi)Vi,i∈[1,N]
each hiThe equation for calculating the middle eta comprises:
Figure BDA0003608806630000091
ηGfor geometric relations, eta, between different objects of the image to be measuredabAnd integrating the attention weight of the image to be measured after the geometric relation is integrated.
In one embodiment, the preset splicing formula includes:
Figure BDA0003608806630000092
wherein the content of the first and second substances,
Figure BDA0003608806630000093
concat is an initial value of the image to be measured and is a splicing function,
Figure BDA0003608806630000094
h parallel head modules of the multi-headed self-care sublayer, WOIs a weight value.
Step S40: and embedding the output value and the category into a decoder of the preset transformation model for processing to generate a description text of the image to be detected.
In step S40, the decoder includes a plurality of identical decoding layers, each decoding layer includes a multi-headed self-attention sublayer, a multi-headed cross-attention sublayer and a position forward sublayer, and includes:
position embedding is carried out on the output value of the last coding layer to be used as the input of the covering multi-head self-attention sublayer, and an input word vector is obtained;
embedding and inputting each output value, the input word vector and the category of the target into the multi-head cross attention sublayer for cross attention calculation to obtain a weight matrix;
and inputting the weight matrix into the position forward sublayer to calculate to generate a plurality of keywords, and splicing all the keywords to generate a description text of the image to be detected.
In this embodiment, a target type of predicted word is first position-coded, then the coded predicted word is input into a covering multi-head self-attention sublayer to obtain a word vector of a weighted sentence, the word vector is a V vector of a first multi-head cross-attention sublayer, an output value of a last layer of a coder is converted into a Q, K vector through two linear conversion layers, and then multi-head self-attention operation is performed with the V vector to obtain a V vector (equal to an input word vector) fused with similarity information.
And after the operation of 6 decoder layers in total, according to the vocabulary of a preset transformation model, concentrating the word information of the real sentence corresponding to each training picture, and passing the output vector through one linear layer and one softmax layer to obtain the next keyword.
And splicing all the keywords to generate a plurality of output sentences, setting the beamsize to be 2 by adopting a beamsearch method, finally obtaining the evaluation index score of each output sentence, and selecting the sentence with the highest score as the description text.
In one embodiment, the embedding each of the output values, the input word vector, and the class of the target into the multi-headed cross-attention sublayer for cross-attention calculation includes:
embedding the output value and the category of the target for mutual integration according to a preset cross attention calculation formula to obtain an integrated value
And according to a preset weight calculation formula, performing weight calculation on the blended value and the input word vector to obtain a cross attention matrix.
In one embodiment, the preset cross-attention calculation formula includes:
Figure BDA0003608806630000101
Figure BDA0003608806630000102
wherein MA is a fusion connection attention module of the multi-head cross attention sublayer, αiA weight matrix, of the same size as the cross-attention result, weights may adjust the degree of contribution of each layer of the encoder output,
Figure BDA0003608806630000111
and Y is the input word vector.
In one embodiment, the preset blending calculation formula of the blending value includes:
Figure BDA0003608806630000112
wherein, the
Figure BDA0003608806630000113
In order to be the value of the blend-in,
Figure BDA0003608806630000114
is the output value, LAttEmbedding for the class of the object.
In one embodiment, the weight calculation formula is preset, and comprises:
Figure BDA0003608806630000115
Figure BDA0003608806630000116
Figure BDA0003608806630000117
[·,·]for the merge operation, σ is the sigmoid activation function, Wi∈R2d×dAs a weight matrix, biTo learn the bias parameters, the bias parameters.
Since the sequence in the encoder is input once, all input information can be acquired when the masked multi-headed self-attention sublayer is calculated, but in the decoder, in order to ensure that only sequence information output before the current time can be seen at each time, the masked multi-headed self-attention sublayer is introduced, and the input word vector is a calculation result of the input information passing through the masked multi-headed self-attention sublayer.
Each head cross attention sublayer needs to pass through a regularization Add & Norm layer, a position forward sublayer (FFN layer) and an Add & Norm layer, and the input of each head cross attention sublayer is converted to have the same mean variance, so that convergence can be accelerated, and the calculation formula is as follows:
Figure BDA0003608806630000118
and finally, determining the next output keyword according to the output characteristics of the last decoding layer, wherein the dimension of the characteristics of the output keyword is the same as the dimension of the vocabulary.
Referring to fig. 2, a functional block diagram of the image description generating apparatus 100 according to the present invention is shown.
The image description generation apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the image description generation apparatus 100 may include an identification module 110, an identification module 20, an output module 130, and a generation module 140. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and can perform a fixed function, and are stored in a memory of the electronic device.
In the present embodiment, the functions of the modules/units are as follows:
the identification module 110: the system comprises a target detection module, a target detection module and a target detection module, wherein the target detection module is used for inputting an image to be detected into the preset target detection module for identification and outputting the regional characteristics of the image to be detected;
the identification module 20: the system is used for inputting the region characteristics into a preset label attention model for weight calculation and outputting the category embedding of the image to be detected;
the output module 130: the encoder is used for inputting the region characteristics into a preset transformation model for processing and outputting an output value of the encoder;
the generation module 140: and the decoder is used for embedding the output value and the category into the preset transformation model for processing, and generating a description text of the image to be detected.
In one embodiment, the inputting the image to be detected into a preset target detection model for recognition, and outputting the region characteristics of the image to be detected includes:
according to a preset geometric relation calculation formula, carrying out frame recognition on a target contained in the image to be detected to obtain frames of the target and a target category of each frame;
and adjusting the size of the frame to a preset range, and outputting the regional characteristics of the image to be detected.
In one embodiment, the predetermined geometric relationship calculation formula includes:
Figure BDA0003608806630000121
xi (a, b) is the regional characteristic of the image to be measured, (x)a,ya) Is the coordinate of the center point of the a-th frame of the image to be detected, (x)b,yb) Is that theThe coordinates of the center point of the (b) th frame of the image to be measured, (w)a,ha) Is the width and height of the a-th frame, (w)b,hb) The width and height of the b-th frame.
In one embodiment, the inputting the region feature into a preset tag attention model for weight calculation and outputting the category embedding of the image to be detected includes:
matching the target category of the image to be detected with preset words of a preset multi-dimensional dictionary according to a preset matching formula to obtain a predicted word and a target label of the target category;
and coding and embedding the predictive words according to a preset first attention formula to obtain the category embedding of the image to be detected.
In an embodiment, the encoding and embedding the predicted word according to a preset first attention formula to obtain the category embedding of the image to be detected, the preset tag attention model includes a plurality of attention modules, each of the attention modules includes an independent scaling dot product attention function, and the method includes:
a1, inputting the predicted word into a matrix of a first attention module for weight calculation according to the preset first attention calculation formula and the scaling dot product attention function, and outputting a first weight value of the first attention module;
a2, inputting the first weight into a matrix of a second attention module for weight calculation, and outputting a second weight value of the second attention module;
and A3, repeating A1-A2 to obtain the weight values of all the attention modules, splicing all the weight values according to a series splicing function, and outputting the category embedding of the image to be detected.
In one embodiment, the encoder includes a plurality of identical encoding layers, each encoding layer includes a multi-headed self-attention sublayer and a position feedforward sublayer, the multi-headed self-attention sublayer includes a plurality of parallel head modules, the inputting the region feature into the encoder of the preset transformation model for processing, and the outputting the output value of the encoder includes:
b1, inputting the geometric features of the region features into a matrix of a first parallel head module in a first coding layer for weight calculation according to a preset second attention calculation formula, and outputting a first result value of the first parallel head module;
b2, inputting the first result value into a matrix of a second parallel header module for weight calculation, and outputting a second result value of the second parallel header module;
b3, repeating B1-B2 to obtain the result values of all the parallel head modules, splicing all the result values according to a preset splicing formula, inputting the spliced result values into the position feedforward sub-layer block for nonlinear transformation, and inputting the transformed result values into a second coding layer of the coder;
b4, repeat B1-B3 output values for all encoded layers.
In one embodiment, the decoder includes a plurality of identical decoding layers, each decoding layer includes a multi-headed self-attention sublayer, a multi-headed cross-attention sublayer and a position forward sublayer, and the decoder that embeds the output values and the categories into the preset transformation model generates the description text of the image to be detected, including:
position embedding is carried out on an output value of the last coding layer to be used as input of the covering multi-head self-attention sublayer, and an input word vector is obtained;
embedding and inputting each output value, the input word vector and the category of the target into the multi-head cross attention sublayer for cross attention calculation to obtain a weight matrix;
and inputting the weight matrix into the position forward sublayer to calculate to generate a plurality of keywords, and splicing all the keywords to generate a description text of the image to be detected.
Fig. 3 is a schematic diagram of an electronic device 1 according to a preferred embodiment of the invention.
The electronic device 1 includes but is not limited to: memory 11, processor 12, display 13, and network interface 14. The electronic device 1 is connected to a network through a network interface 14 to obtain raw data. The network may be a wireless or wired network such as an Intranet (Internet), the Internet (Internet), a global system for mobile communications (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or a call network.
The memory 11 includes at least one type of readable medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (FlashCard), or the like, which is equipped with the electronic device 1. Of course, the memory 11 may also comprise both an internal memory unit and an external memory device of the electronic device 1. In this embodiment, the memory 11 is generally used for storing an operating system installed in the electronic device 1 and various types of application software, such as program codes of the image description generation 10. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used for controlling the overall operation of the electronic device 1, such as performing data interaction or communication related control and processing. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the program code of the image description generator 10.
The display 13 may be referred to as a display screen or display unit. In some embodiments, the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch panel, or the like. The display 13 is used for displaying information processed in the electronic device 1 and for displaying a visual work interface, e.g. displaying the results of data statistics.
The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), the network interface 14 typically being used for establishing a communication connection between the electronic device 1 and other electronic devices.
Fig. 3 only shows the electronic device 1 with the components 11-14 and the image description generation 10, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
Optionally, the electronic device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
The electronic device 1 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.
In the above embodiment, the processor 12, when executing the image description generation 10 stored in the memory 11, may implement the following steps:
inputting an image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected;
inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected;
inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder;
and embedding the output value and the category into a decoder of the preset transformation model for processing to generate a description text of the image to be detected.
The storage device may be the memory 11 of the electronic device 1, or may be another storage device communicatively connected to the electronic device 1.
For detailed description of the above steps, please refer to the above description of fig. 2 regarding a functional block diagram of an embodiment of the image description generating apparatus 100 and fig. 1 regarding a flowchart of an embodiment of the image description generating method.
In addition, the embodiment of the present invention further provides a computer-readable medium, which may be non-volatile or volatile. The computer readable medium may be any one or any combination of hard disk, multimedia card, SD card, flash memory card, SMC, Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), USB memory, and the like. The computer readable medium comprises a storage data area and a storage program area, the storage data area stores data created according to the use of the block chain nodes, the storage program area stores an image description generation 10, and the image description generation 10 realizes the following operations when being executed by a processor:
inputting an image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected;
inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected;
inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder;
and embedding the output value and the category into a decoder of the preset transformation model for processing to generate a description text of the image to be detected.
The specific implementation of the computer readable medium of the present invention is substantially the same as the specific implementation of the image description generation method, and is not repeated herein.
In another embodiment, in order to further ensure the privacy and security of all the appearing data, all the data may be stored in a node of a block chain. Such as region characteristics, category embedding, all of which may be stored in block link points.
It should be noted that the block chain in the present invention is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, herein are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (which may be a mobile phone, a computer, an electronic device, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An image description generation method, characterized in that the method comprises:
inputting an image to be detected into a preset target detection model for identification, and outputting the regional characteristics of the image to be detected;
inputting the region characteristics into a preset label attention model for weight calculation, and outputting the category embedding of the image to be detected;
inputting the region characteristics into an encoder of a preset transformation model for processing, and outputting an output value of the encoder;
and embedding the output value and the category into a decoder of the preset transformation model for processing to generate a description text of the image to be detected.
2. The image description generation method of claim 1, wherein the inputting an image to be detected into a preset target detection model for recognition and outputting the region characteristics of the image to be detected comprises:
according to a preset geometric relation calculation formula, carrying out frame recognition on a target contained in the image to be detected to obtain frames of the target and a target category of each frame;
and adjusting the size of the frame to a preset range, and outputting the regional characteristics of the image to be detected.
3. The image description generation method according to claim 2, wherein the preset geometric relationship calculation formula includes:
Figure FDA0003608806620000011
xi (a, b) is the regional characteristic of the image to be measured, (x)a,ya) Is the coordinate of the center point of the a-th frame of the image to be detected, (x)b,yb) Is the center point coordinate of the b-th frame of the image to be detected, (w)a,ha) Width and height of the a-th frame, (w)b,hb) The width and height of the b-th frame.
4. The image description generation method of claim 1, wherein the inputting the region features into a preset label attention model for weight calculation and outputting the category embedding of the image to be detected comprises:
matching the target category of the image to be detected with preset words of a preset multi-dimensional dictionary according to a preset matching formula to obtain a predicted word and a target label of the target category;
and coding and embedding the predictive words according to a preset first attention formula to obtain the category embedding of the image to be detected.
5. The image description generation method of claim 4, wherein the preset tag attention model includes a plurality of attention modules, each of the attention modules includes an independent scaling dot product attention function, and the encoding embedding of the predicted word according to the preset first attention formula to obtain the category embedding of the image to be detected includes:
a1, inputting the predicted word into a matrix of a first attention module for weight calculation according to the preset first attention calculation formula and the scaling dot product attention function, and outputting a first weight value of the first attention module;
a2, inputting the first weight into a matrix of a second attention module for weight calculation, and outputting a second weight value of the second attention module;
and A3, repeating A1-A2 to obtain the weight values of all the attention modules, splicing all the weight values according to a series splicing function, and outputting the category embedding of the image to be detected.
6. The image description generation method according to claim 1, wherein the encoder includes a plurality of identical encoding layers, each encoding layer includes a multi-headed attention sublayer and a position feed-forward sublayer, the multi-headed attention sublayer includes a plurality of parallel head modules, the inputting the region feature into an encoder of a preset transformation model for processing, and outputting an output value of the encoder includes:
b1, inputting the geometric features of the region features into a matrix of a first parallel head module in a first coding layer for weight calculation according to a preset second attention calculation formula, and outputting a first result value of the first parallel head module;
b2, inputting the first result value into a matrix of a second parallel head module for weight calculation, and outputting a second result value of the second parallel head module;
b3, repeating B1-B2 to obtain the result values of all the parallel head modules, splicing all the result values according to a preset splicing formula, inputting the spliced result values into the position feedforward sub-layer block for nonlinear transformation, and inputting the transformed result values into a second coding layer of the coder;
b4, repeat B1-B3 output values for all encoded layers.
7. The image description generation method of claim 1, wherein the decoder includes a plurality of identical decoding layers, each decoding layer includes a multi-head self-attention sublayer, a multi-head cross-attention sublayer and a position forward sublayer, and the embedding of the output values and the categories into the decoder of the preset transformation model for processing generates the description text of the image to be detected, including:
position embedding is carried out on an output value of the last coding layer to be used as input of the covering multi-head self-attention sublayer, and an input word vector is obtained;
embedding and inputting each output value, the input word vector and the category of the target into the multi-head cross attention sublayer for cross attention calculation to obtain a weight matrix;
and inputting the weight matrix into the position forward sublayer to calculate to generate a plurality of keywords, and splicing all the keywords to generate a description text of the image to be detected.
8. An image description generation apparatus, characterized in that the apparatus comprises:
an identification module: the system comprises a target detection module, a target detection module and a target detection module, wherein the target detection module is used for inputting an image to be detected into the preset target detection module for identification and outputting the regional characteristics of the image to be detected;
a calculation module: the system is used for inputting the region characteristics into a preset label attention model for weight calculation and outputting the category embedding of the image to be detected;
an output module: the encoder is used for inputting the region characteristics into a preset transformation model for processing and outputting an output value of the encoder;
a generation module: and the decoder is used for embedding the output value and the category into the preset transformation model for processing to generate a description text of the image to be detected.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a program executable by the at least one processor to enable the at least one processor to perform the image description generation method of any one of claims 1 to 7.
10. A computer-readable medium storing an image description generation that, when executed by a processor, implements the image description generation method of any one of claims 1 to 7.
CN202210423256.8A 2022-04-21 2022-04-21 Image description generation method, device, equipment and medium Active CN114743018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210423256.8A CN114743018B (en) 2022-04-21 2022-04-21 Image description generation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210423256.8A CN114743018B (en) 2022-04-21 2022-04-21 Image description generation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114743018A true CN114743018A (en) 2022-07-12
CN114743018B CN114743018B (en) 2024-05-31

Family

ID=82284146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210423256.8A Active CN114743018B (en) 2022-04-21 2022-04-21 Image description generation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114743018B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821271A (en) * 2022-05-19 2022-07-29 平安科技(深圳)有限公司 Model training method, image description generation device and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042244A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Image description generation method, model training method and device, and storage medium
CN109543699A (en) * 2018-11-28 2019-03-29 北方工业大学 Image abstract generation method based on target detection
CN109753981A (en) * 2017-11-06 2019-05-14 彼乐智慧科技(北京)有限公司 A kind of method and device of image recognition
CN110472688A (en) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 The method and device of iamge description, the training method of image description model and device
US20200151448A1 (en) * 2018-11-13 2020-05-14 Adobe Inc. Object Detection In Images
CN111639594A (en) * 2020-05-29 2020-09-08 苏州遐迩信息技术有限公司 Training method and device of image description model
KR102225024B1 (en) * 2019-10-24 2021-03-08 연세대학교 산학협력단 Apparatus and method for image inpainting
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
CA3068891A1 (en) * 2020-01-17 2021-07-17 Element Ai Inc. Method and system for generating a vector representation of an image
CN113222916A (en) * 2021-04-28 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device and medium for detecting image using target detection model
CN113591967A (en) * 2021-07-27 2021-11-02 南京旭锐软件科技有限公司 Image processing method, device and equipment and computer storage medium
CN113609326A (en) * 2021-08-25 2021-11-05 广西师范大学 Image description generation method based on external knowledge and target relation
CN113946706A (en) * 2021-05-20 2022-01-18 广西师范大学 Image description generation method based on reference preposition description
CN114266905A (en) * 2022-01-11 2022-04-01 重庆师范大学 Image description generation model method and device based on Transformer structure and computer equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042244A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Image description generation method, model training method and device, and storage medium
CN109753981A (en) * 2017-11-06 2019-05-14 彼乐智慧科技(北京)有限公司 A kind of method and device of image recognition
US20200151448A1 (en) * 2018-11-13 2020-05-14 Adobe Inc. Object Detection In Images
CN109543699A (en) * 2018-11-28 2019-03-29 北方工业大学 Image abstract generation method based on target detection
CN110472688A (en) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 The method and device of iamge description, the training method of image description model and device
KR102225024B1 (en) * 2019-10-24 2021-03-08 연세대학교 산학협력단 Apparatus and method for image inpainting
CA3068891A1 (en) * 2020-01-17 2021-07-17 Element Ai Inc. Method and system for generating a vector representation of an image
CN111639594A (en) * 2020-05-29 2020-09-08 苏州遐迩信息技术有限公司 Training method and device of image description model
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
CN113222916A (en) * 2021-04-28 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device and medium for detecting image using target detection model
CN113946706A (en) * 2021-05-20 2022-01-18 广西师范大学 Image description generation method based on reference preposition description
CN113591967A (en) * 2021-07-27 2021-11-02 南京旭锐软件科技有限公司 Image processing method, device and equipment and computer storage medium
CN113609326A (en) * 2021-08-25 2021-11-05 广西师范大学 Image description generation method based on external knowledge and target relation
CN114266905A (en) * 2022-01-11 2022-04-01 重庆师范大学 Image description generation model method and device based on Transformer structure and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
牛斌等: "一种基于注意力机制与多模态的图像描述方法", 辽宁大学学报(自然科学版), no. 01, 15 February 2019 (2019-02-15), pages 38 - 45 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821271A (en) * 2022-05-19 2022-07-29 平安科技(深圳)有限公司 Model training method, image description generation device and storage medium
CN114821271B (en) * 2022-05-19 2022-09-16 平安科技(深圳)有限公司 Model training method, image description generation device and storage medium

Also Published As

Publication number Publication date
CN114743018B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN111241304B (en) Answer generation method based on deep learning, electronic device and readable storage medium
CN111695439B (en) Image structured data extraction method, electronic device and storage medium
CN112308237B (en) Question-answer data enhancement method and device, computer equipment and storage medium
CN113792741B (en) Character recognition method, device, equipment and storage medium
CN113344206A (en) Knowledge distillation method, device and equipment integrating channel and relation feature learning
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN110929524A (en) Data screening method, device, equipment and computer readable storage medium
CN110276382B (en) Crowd classification method, device and medium based on spectral clustering
US20230334893A1 (en) Method for optimizing human body posture recognition model, device and computer-readable storage medium
CN111680480A (en) Template-based job approval method and device, computer equipment and storage medium
CN114898219B (en) SVM-based manipulator touch data representation and identification method
CN113886550A (en) Question-answer matching method, device, equipment and storage medium based on attention mechanism
CN114880449B (en) Method and device for generating answers of intelligent questions and answers, electronic equipment and storage medium
CN113807728A (en) Performance assessment method, device, equipment and storage medium based on neural network
CN112036189A (en) Method and system for recognizing gold semantic
CN116821373A (en) Map-based prompt recommendation method, device, equipment and medium
CN114743018B (en) Image description generation method, device, equipment and medium
CN114399775A (en) Document title generation method, device, equipment and storage medium
CN114417785A (en) Knowledge point annotation method, model training method, computer device, and storage medium
CN113836929A (en) Named entity recognition method, device, equipment and storage medium
CN113761375A (en) Message recommendation method, device, equipment and storage medium based on neural network
CN113868419A (en) Text classification method, device, equipment and medium based on artificial intelligence
CN110362681B (en) Method, device and storage medium for identifying repeated questions of question-answering system
CN116702761A (en) Text error correction method, device, equipment and storage medium
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant