WO2023173547A1 - Text image matching method and apparatus, device, and storage medium - Google Patents

Text image matching method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023173547A1
WO2023173547A1 PCT/CN2022/090161 CN2022090161W WO2023173547A1 WO 2023173547 A1 WO2023173547 A1 WO 2023173547A1 CN 2022090161 W CN2022090161 W CN 2022090161W WO 2023173547 A1 WO2023173547 A1 WO 2023173547A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate object
candidate
matched
type
similarity
Prior art date
Application number
PCT/CN2022/090161
Other languages
French (fr)
Chinese (zh)
Inventor
郑喜民
翟尤
周成昊
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173547A1 publication Critical patent/WO2023173547A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular to a text image matching method, device, equipment and storage medium.
  • Text-image matching refers to a cross-modal matching search method. By giving a piece of natural language text, retrieve images that match the description of the text; or given an image, retrieve text that matches the content of the image.
  • the system needs to process images and natural language text information separately, and then perform matching based on the processing results.
  • the image and the natural language text are first extracted through the feature extraction network respectively, and then the two extracted features are matched. . Because the difference between images and text is huge, the features between the two modalities are often difficult to match, resulting in low matching accuracy.
  • the main purpose of this application is to provide a text image matching method, device, equipment and storage medium, aiming to solve the current problem of text image matching.
  • the image and natural language text are extracted through the feature extraction network respectively, and then the features are extracted.
  • the two extracted features are matched, there is a technical problem of low matching accuracy.
  • this application proposes a text image matching method, which method includes:
  • a target matching result corresponding to the object to be matched is determined.
  • This application also proposes a text image matching device, which includes:
  • Data acquisition module used to obtain objects to be matched
  • a type recognition result determination module is used to perform type recognition on the object to be matched and obtain a type recognition result
  • a candidate object set determination module configured to determine a candidate object set from a preset candidate object library according to the type recognition result
  • a fusion feature extraction module configured to perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
  • a candidate object feature determination module configured to perform feature extraction on each candidate object in the candidate object set to obtain candidate object features
  • a single object similarity determination module used to calculate the similarity between the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity
  • a target matching result determination module configured to determine a target matching result corresponding to the object to be matched based on each of the single object similarities and the candidate object set.
  • This application also proposes a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements the above text image matching method.
  • the method includes:
  • a target matching result corresponding to the object to be matched is determined.
  • This application also proposes a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the above-mentioned text-image matching method is implemented.
  • the method includes:
  • a target matching result corresponding to the object to be matched is determined.
  • the text image matching method, device, equipment and storage medium of the present application wherein the method obtains a type recognition result by performing type recognition on the object to be matched; and determines candidates from a preset candidate object library according to the type recognition result.
  • Object set perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform feature extraction on the same object set
  • the fusion feature corresponding to the candidate object and the candidate object feature are subjected to similarity calculation to obtain a single object similarity; according to each of the single object similarity and the candidate object set, the similarity corresponding to the object to be matched is determined. Goal matching results.
  • Figure 1 is a schematic flow chart of a text image matching method according to an embodiment of the present application
  • Figure 2 is a schematic structural block diagram of a text image matching device according to an embodiment of the present application.
  • Figure 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • an embodiment of the present application provides a text image matching method, which method includes:
  • S4 Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
  • S7 Determine the target matching result corresponding to the object to be matched according to the similarity of each single object and the candidate object set.
  • a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object.
  • Perform fusion feature extraction on each candidate object in the set perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  • the object to be matched is the object that needs to be matched with text and images.
  • the object to be matched is a piece of text or an image.
  • the value range of type identification results includes: text type and image type.
  • the type recognition result is matched with type identifiers in the candidate object library, and the font library corresponding to the sub-library identifier corresponding to the matched type identifier in the candidate object library is used as a candidate object set.
  • the candidate object library includes: type identifier and sub-library identifier.
  • feature extraction is performed based on the coding of the object to be matched and the coding of each candidate object in the candidate object set, and the extracted features are used as fusion features.
  • the number of fused features is the same as the number of candidate objects in the candidate object set.
  • candidate object features correspond to the candidate objects one-to-one.
  • the number of single-object similarities is the same as the number of candidate objects in the candidate object set.
  • the single-object similarity is cosine similarity
  • find the single-object similarity with the largest value from each of the single-object similarities and place the found single-object similarity at that location.
  • the candidate object corresponding to the candidate object set is used as the hit object of the target matching result corresponding to the object to be matched; when the single object similarity is the Euclidean distance, the value found from each of the single object similarities is
  • the candidate object corresponding to the single object similarity found in the candidate object set is the smallest single object similarity as the hit object of the target matching result corresponding to the object to be matched.
  • the above-mentioned step of performing type identification on the object to be matched and obtaining the type identification result includes:
  • This embodiment uses a text image classification model to perform classification prediction, thereby improving the results of classification prediction, thereby improving the accuracy of text image matching.
  • the object to be matched is input into a preset text image classification model for classification prediction, and the data obtained by classification prediction is used as the classification prediction result.
  • the text image classification model can use a binary classifier.
  • the classification prediction result is a vector. There are two vector elements in this vector. The two vector elements correspond to text labels and image labels respectively. The vector elements in this vector are probability values.
  • the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it means that the vector element corresponding to the text label is the largest, and at this time, the waiting The matching object is a piece of text, so the type recognition result is determined to be a text type.
  • the above-mentioned step of determining a candidate object set from a preset candidate object library based on the type recognition result includes:
  • the image sub-library is used as the candidate object set
  • the type recognition result is an image type
  • the text sub-library is used as the candidate object set, thereby providing a fusion feature. Provides the basis for generation and text-image matching.
  • the type recognition result is a text type
  • the object to be matched is a piece of text
  • the image sub-library corresponding to the sub-library identifier corresponding to the text type in the candidate object library is used as the candidate object. set, at this time the candidate object in the candidate object set is an image.
  • the type recognition result is an image type
  • the object to be matched is an image
  • the image sub-library corresponding to the sub-library identifier corresponding to the image type in the candidate object library is used as the candidate Object set, in which case the candidate object in the candidate object set is text.
  • the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes:
  • S44 Splice the first code and the second code in dimensions to obtain a fusion code
  • S45 Input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the fusion feature corresponding to the target object.
  • the object to be matched and the candidate object are first encoded and dimensionally spliced respectively, and then the result of dimension splicing is input into the fusion feature extraction model for feature extraction, thereby extracting the intermediate features between the image and the text, as It provides a basis for matching fusion features and candidate object features.
  • the target object when the type of the candidate object set is a text type, the target object is input into the coding model corresponding to the text type for encoding, and the coded data is used as the first encoding; when the candidate object set is When the type is an image type, the target object is input into the encoding model corresponding to the image type for encoding, and the encoded data is used as the first encoding.
  • the coding model adopts a fully connected layer. Because the encoding model is a shallower information encoding, a large amount of original information in the target object will be retained.
  • the encoding model can also adopt other encoding models, which are not limited here.
  • the type recognition result is a text type
  • the object to be matched is input into the coding model corresponding to the text type for coding, and the coded data is used as the second code
  • the type recognition result is an image type
  • the object to be matched is input into a coding model corresponding to the image type for coding, and the coded data is used as the second coding.
  • the first code and the second code are spliced in the dimension in the order of text first and then image, and the spliced data is used as the fusion code.
  • the fusion code is in the dimension
  • the top order is text encoding and image encoding.
  • the first code and the second code are spliced in the dimension in the order of image first and then text, and the data obtained by splicing is used as the fusion code.
  • the fusion code in the dimension is sequentially: Image encoding, text encoding.
  • the fusion code is input into a preset fusion feature extraction model for feature extraction, and the extracted features are used as the fusion features corresponding to the target object.
  • the fusion feature extraction model is a model trained based on the Rresnet50 network or Unet network.
  • Rresnet50 network is a deep residual network.
  • Unet network is a semantic segmentation network.
  • the above-mentioned step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes:
  • S51 Enter each candidate object in the candidate object set into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and obtain that each candidate object corresponds to the candidate object. feature.
  • This embodiment uses a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, thereby improving the accuracy of the extracted features and improving the accuracy of text image matching.
  • each candidate object in the candidate object set is input into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and the extracted features are used as one candidate object feature. .
  • the single object feature extraction model corresponding to the type of the candidate object set is a model obtained by training the LSTM network using multiple text training samples; when the type of the candidate object set is a text type, When the type of the candidate object set is an image type, the single object feature extraction model corresponding to the type of the candidate object set is a model obtained by training the Rresnet50 network or the Unet network using multiple image training samples.
  • LSTM network refers to long short-term memory artificial neural network.
  • Text training samples include: text samples and text feature calibration data.
  • Image training samples include: image samples and image feature calibration data.
  • the above-mentioned step of calculating the similarity between the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes:
  • S64 Perform cosine similarity calculation on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
  • This embodiment uses cosine similarity to perform similarity calculation. Since cosine similarity tends to provide a better solution, the accuracy of text-image matching is further improved.
  • the first feature and the second feature are features corresponding to the same candidate object. Therefore, cosine similarity calculation is performed on the first feature and the second feature, and the calculated The cosine similarity is used as the single object similarity corresponding to the object to be calculated.
  • the single object similarity corresponding to each candidate object in the candidate object set can be determined.
  • the above-mentioned single object similarity is cosine similarity
  • the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes:
  • S71 Find the single-object similarity with the largest value from each of the single-object similarities, and use it as the target similarity;
  • This embodiment further improves the accuracy of the single object similarity by taking the value greater than the preset similarity threshold as the maximum single object similarity and the corresponding candidate object in the candidate object set as the hit object of the target matching result. Determine the accuracy of target matching results.
  • the single object similarity with the largest value is found from each of the single object similarities, and the found single object similarity is used as the target similarity.
  • the target similarity is greater than the preset similarity threshold, which means that there is a single object similarity that meets the requirements, then it is determined that the result of the target matching result is successful, and the target is The candidate object corresponding to the similarity in the candidate object set is used as the hit object of the target matching result.
  • the target similarity is less than or equal to the preset similarity threshold, which means that there is no single object similarity that meets the requirements, then it is determined that the target matching result is a failure.
  • this application also proposes a text image matching device, which includes:
  • the data acquisition module 100 is used to acquire objects to be matched;
  • the type recognition result determination module 200 is used to perform type recognition on the object to be matched and obtain the type recognition result;
  • the candidate object set determination module 300 is used to determine the candidate object set from the preset candidate object library according to the type recognition result
  • the fusion feature extraction module 400 is configured to perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
  • the candidate object feature determination module 500 is used to perform feature extraction on each candidate object in the candidate object set to obtain candidate object features
  • the single object similarity determination module 600 is used to calculate the similarity between the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity;
  • the target matching result determination module 700 is configured to determine the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set.
  • a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object.
  • Perform fusion feature extraction on each candidate object in the set perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  • the above-mentioned type recognition result determination module 200 includes: a classification prediction result determination sub-module, a first result determination sub-module and a second result determination sub-module;
  • the classification prediction result determination sub-module is used to input the object to be matched into a preset text image classification model for classification prediction to obtain a classification prediction result;
  • the first result determination submodule is configured to determine that the type identification result is when the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result.
  • the second result determination submodule is configured to determine the vector element corresponding to the text label in the classification prediction result when the vector element corresponding to the image label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result.
  • the type recognition result is the image type.
  • the above-mentioned candidate object set determination module 300 includes: a first candidate object set determination sub-module and a second candidate object set determination sub-module;
  • the first candidate object set determination sub-module is configured to use the image sub-library in the candidate object library as the candidate object set when the type recognition result is a text type;
  • the second candidate object set determination sub-module is configured to use the text sub-library in the candidate object library as the candidate object set when the type recognition result is an image type.
  • the above-mentioned fusion feature extraction module 400 includes: a fusion feature extraction sub-module;
  • the fusion feature extraction sub-module is used to use any of the candidate objects in the candidate object set as a target object, and input the target object into a coding model corresponding to the type of the candidate object set for encoding, to obtain
  • the object to be matched is input into the encoding model corresponding to the type recognition result and encoded to obtain the second encoding.
  • the first encoding and the second encoding are spliced in dimensions. , obtain the fusion code, input the fusion code into a preset fusion feature extraction model for feature extraction, and obtain the fusion feature corresponding to the target object.
  • the above-mentioned candidate object feature determination module 500 includes: a candidate object feature determination sub-module;
  • the candidate object feature determination submodule is used to input each candidate object in the candidate object set into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and obtain each candidate object.
  • the candidate object corresponds to the candidate object feature.
  • the above-mentioned single object similarity determination module 600 includes: cosine similarity calculation sub-module;
  • the cosine similarity calculation sub-module is used to use any of the candidate objects in the candidate object set as an object to be calculated, the fusion feature corresponding to the object to be calculated as the first feature, and the The candidate object feature corresponding to the object to be calculated is used as the second feature, and cosine similarity calculation is performed on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
  • the above-mentioned target matching result determination module 700 includes: a similarity screening sub-module and a target matching result determination sub-module;
  • the similarity screening sub-module is used to find the single object similarity with the largest value from each of the single object similarities as the target similarity;
  • the target matching result determination sub-module is used to determine whether the target similarity is greater than a preset similarity threshold, and the first matching result determination sub-module is used to determine whether the target matching result is If successful, the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result. If not, the result of the target matching result is determined to be failure.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3 .
  • the computer device includes a processor, memory, network interface, and database connected through a system bus. Among them, the processor designed by the computer is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operation of operating systems and computer programs in non-volatile storage media.
  • the database of the computer device is used to store data such as text image matching methods.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer program when executed by the processor, implements the text image matching method in any of the above embodiments.
  • the text image matching method includes: obtaining an object to be matched; performing type recognition on the object to be matched to obtain a type recognition result; determining a candidate object set from a preset candidate object library according to the type recognition result; Performing fusion feature extraction on the object to be matched and each candidate object in the candidate object set; performing feature extraction on each candidate object in the candidate object set to obtain candidate object features; corresponding to the same candidate object Perform similarity calculations on the fused features and the candidate object features to obtain a single object similarity; and determine a target matching result corresponding to the object to be matched based on each of the single object similarities and the candidate object set.
  • a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object.
  • Perform fusion feature extraction on each candidate object in the set perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  • the above step of performing type identification on the object to be matched and obtaining the type identification result includes: inputting the object to be matched into a preset text image classification model for classification prediction and obtaining the classification prediction result; when When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type recognition result is a text type; when the AND in the classification prediction result When the vector element corresponding to the text label is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
  • the above step of determining a candidate object set from a preset candidate object library based on the type identification result includes: when the type identification result is a text type, converting the candidate object set in the candidate object library The image sub-library is used as the candidate object set; when the type recognition result is an image type, the text sub-library in the candidate object library is used as the candidate object set.
  • the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes: using any one of the candidate objects in the candidate object set as a target object; The target object is input into the encoding model corresponding to the type of the candidate object set for encoding to obtain the first encoding; the object to be matched is input into the encoding model corresponding to the type recognition result for encoding, Obtain the second code; splice the first code and the second code in dimensions to obtain a fusion code; input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the target The fused features corresponding to the object.
  • the above step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes: inputting each candidate object in the candidate object set with the corresponding Feature extraction is performed in a single object feature extraction model corresponding to the type of the candidate object set, and each candidate object corresponding to the candidate object feature is obtained.
  • the above-mentioned step of calculating the similarity of the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes: calculating any one of the candidate objects in the set The candidate object is used as the object to be calculated; the fusion feature corresponding to the object to be calculated is used as the first feature; the candidate object feature corresponding to the object to be calculated is used as the second feature; the first feature and The second feature performs cosine similarity calculation to obtain the single object similarity corresponding to the object to be calculated.
  • the above-mentioned single object similarity is cosine similarity
  • the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes: Find the single object similarity with the largest value from each of the single object similarities as the target similarity; determine whether the target similarity is greater than the preset similarity threshold; if so, determine that the target matches The result of the result is success, and the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result; if not, the result of the target matching result is determined to be failure.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. When the computer program is executed by a processor, the above-mentioned steps are implemented.
  • the text image matching method in any embodiment includes the steps of: obtaining an object to be matched; performing type recognition on the object to be matched to obtain a type recognition result; and determining from a preset candidate object library according to the type recognition result.
  • Candidate object set perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; extract the same
  • the fusion feature corresponding to the candidate object and the candidate object feature are subjected to similarity calculation to obtain a single object similarity; according to the similarity of each single object and the candidate object set, the corresponding object to be matched is determined. target matching results.
  • the text image matching method executed above obtains a type recognition result by performing type recognition on the object to be matched; according to the type recognition result, a candidate object set is determined from a preset candidate object library; according to the object to be matched Perform fusion feature extraction with each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion features corresponding to the same candidate object Similarity calculation is performed with the candidate object characteristics to obtain a single object similarity; and a target matching result corresponding to the object to be matched is determined based on each single object similarity and the candidate object set.
  • the above step of performing type identification on the object to be matched and obtaining the type identification result includes: inputting the object to be matched into a preset text image classification model for classification prediction and obtaining the classification prediction result; when When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type recognition result is a text type; when the AND in the classification prediction result When the vector element corresponding to the text label is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
  • the above step of determining a candidate object set from a preset candidate object library based on the type identification result includes: when the type identification result is a text type, converting the candidate object set in the candidate object library The image sub-library is used as the candidate object set; when the type recognition result is an image type, the text sub-library in the candidate object library is used as the candidate object set.
  • the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes: using any one of the candidate objects in the candidate object set as a target object; The target object is input into the encoding model corresponding to the type of the candidate object set for encoding to obtain the first encoding; the object to be matched is input into the encoding model corresponding to the type recognition result for encoding, Obtain the second code; splice the first code and the second code in dimensions to obtain a fusion code; input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the target The fused features corresponding to the object.
  • the above step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes: inputting each candidate object in the candidate object set with the corresponding Feature extraction is performed in a single object feature extraction model corresponding to the type of the candidate object set, and each candidate object corresponding to the candidate object feature is obtained.
  • the above-mentioned step of calculating the similarity of the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes: calculating any one of the candidate objects in the set The candidate object is used as the object to be calculated; the fusion feature corresponding to the object to be calculated is used as the first feature; the candidate object feature corresponding to the object to be calculated is used as the second feature; the first feature and The second feature performs cosine similarity calculation to obtain the single object similarity corresponding to the object to be calculated.
  • the above-mentioned single object similarity is cosine similarity
  • the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes: Find the single object similarity with the largest value from each of the single object similarities as the target similarity; determine whether the target similarity is greater than the preset similarity threshold; if so, determine that the target matches The result of the result is success, and the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result; if not, the result of the target matching result is determined to be failure.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SDRAM dual-speed data rate SDRAM
  • SSRSDRAM dual-speed data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the technical field of artificial intelligence. Disclosed are a text image matching method and apparatus, a device, and a storage medium. The method comprises: identifying the type of an object to be matched to obtain a type identification result; determining a candidate object set from a preset candidate object library according to the type identification result; extracting a fusion feature according to the object to be matched and each candidate object in the candidate object set; performing feature extraction on each candidate object in the candidate object set to obtain candidate object features; performing similarity calculation on the fusion feature and the candidate object feature corresponding to the same candidate object to obtain a single-object similarity; and determining, according to each single-object similarity and the candidate object set, a target matching result corresponding to the object to be matched. A direct matching operation between image features and text features is avoided, and the matching precision can be improved by adopting fusion features for text image matching.

Description

文本图像匹配方法、装置、设备及存储介质Text image matching method, device, equipment and storage medium
本申请要求于2022年3月16日提交中国专利局、申请号为202210256789.1,发明名称为“文本图像匹配方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on March 16, 2022, with application number 202210256789.1 and the invention title "Text Image Matching Method, Device, Equipment and Storage Medium", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本申请涉及到人工智能技术领域,特别是涉及到一种文本图像匹配方法、装置、设备及存储介质。The present application relates to the field of artificial intelligence technology, and in particular to a text image matching method, device, equipment and storage medium.
背景技术Background technique
文本图像匹配,指的是一种跨模态的匹配搜索方式。通过给定的一段自然语言文本,检索出与这段文本描述相符合的图像;或者给出一张图像,检索出与图像内容相一致的文本。Text-image matching refers to a cross-modal matching search method. By giving a piece of natural language text, retrieve images that match the description of the text; or given an image, retrieve text that matches the content of the image.
作为一种跨模态的匹配搜索方式,系统需要分别处理图像和自然语言文本两种信息,然后根据处理结果进行匹配。目前已经有一些此方面的数据集和算法,但是发明人发现,在这些算法中,首先将图像和自然语言文本分别通过特征提取网络进行特征提取,然后再对提取出的两种特征进行匹配操作。因为图像和文本之间的差异巨大,这两个模态之间的特征往往难以匹配,会造成匹配精度较低。As a cross-modal matching search method, the system needs to process images and natural language text information separately, and then perform matching based on the processing results. There are already some data sets and algorithms in this area, but the inventor found that in these algorithms, the image and the natural language text are first extracted through the feature extraction network respectively, and then the two extracted features are matched. . Because the difference between images and text is huge, the features between the two modalities are often difficult to match, resulting in low matching accuracy.
技术问题technical problem
本申请的主要目的为提供一种文本图像匹配方法、装置、设备及存储介质,旨在解决目前在文本图像匹配时,首先将图像和自然语言文本分别通过特征提取网络进行特征提取,然后再对提取出的两种特征进行匹配操作,存在匹配精度较低的技术问题。The main purpose of this application is to provide a text image matching method, device, equipment and storage medium, aiming to solve the current problem of text image matching. First, the image and natural language text are extracted through the feature extraction network respectively, and then the features are extracted. When the two extracted features are matched, there is a technical problem of low matching accuracy.
技术解决方案Technical solutions
为了实现上述发明目的,本申请提出一种文本图像匹配方法,所述方法包括:In order to achieve the above-mentioned object of the invention, this application proposes a text image matching method, which method includes:
获取待匹配对象;Get the object to be matched;
对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
本申请还提出了一种文本图像匹配装置,所述装置包括:This application also proposes a text image matching device, which includes:
数据获取模块,用于获取待匹配对象;Data acquisition module, used to obtain objects to be matched;
类型识别结果确定模块,用于对所述待匹配对象进行类型识别,得到类型识别结果;A type recognition result determination module is used to perform type recognition on the object to be matched and obtain a type recognition result;
候选对象集确定模块,用于根据所述类型识别结果,从预设的候选对象库中确定候选对象集;A candidate object set determination module, configured to determine a candidate object set from a preset candidate object library according to the type recognition result;
融合特征提取模块,用于根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;A fusion feature extraction module, configured to perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
候选对象特征确定模块,用于对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;A candidate object feature determination module, configured to perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
单对象相似度确定模块,用于对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;A single object similarity determination module, used to calculate the similarity between the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity;
目标匹配结果确定模块,用于根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。A target matching result determination module, configured to determine a target matching result corresponding to the object to be matched based on each of the single object similarities and the candidate object set.
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述文本图像匹配方法,所述方法包括:This application also proposes a computer device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the above text image matching method. The method includes:
获取待匹配对象;Get the object to be matched;
对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述文本图像匹配方法,所述方法包括:This application also proposes a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the above-mentioned text-image matching method is implemented. The method includes:
获取待匹配对象;Get the object to be matched;
对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
有益效果beneficial effects
本申请的文本图像匹配方法、装置、设备及存储介质,其中方法通过对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。通过首先对待匹配对象和候选对象进行融合特征提取,然后对融合特征与候选对象特征进行匹配操作,避免图像特征和文本特征的直接匹配操作,而且采用融合特征进行文本图像匹配可以增加匹配的精度,提高了文本图像匹配的准确性。The text image matching method, device, equipment and storage medium of the present application, wherein the method obtains a type recognition result by performing type recognition on the object to be matched; and determines candidates from a preset candidate object library according to the type recognition result. Object set; perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform feature extraction on the same object set The fusion feature corresponding to the candidate object and the candidate object feature are subjected to similarity calculation to obtain a single object similarity; according to each of the single object similarity and the candidate object set, the similarity corresponding to the object to be matched is determined. Goal matching results. By first extracting fusion features of the object to be matched and the candidate object, and then performing a matching operation on the fused features and candidate object features, the direct matching operation of image features and text features is avoided, and the use of fusion features for text-image matching can increase the matching accuracy. Improved text-image matching accuracy.
附图说明Description of the drawings
图1为本申请一实施例的文本图像匹配方法的流程示意图;Figure 1 is a schematic flow chart of a text image matching method according to an embodiment of the present application;
图2 为本申请一实施例的文本图像匹配装置的结构示意框图;Figure 2 is a schematic structural block diagram of a text image matching device according to an embodiment of the present application;
图3 为本申请一实施例的计算机设备的结构示意框图。Figure 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present application will be further described with reference to the embodiments and the accompanying drawings.
本发明的最佳实施方式Best Mode of Carrying Out the Invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
参照图1,本申请实施例中提供一种文本图像匹配方法,所述方法包括:Referring to Figure 1, an embodiment of the present application provides a text image matching method, which method includes:
S1:获取待匹配对象;S1: Get the object to be matched;
S2:对所述待匹配对象进行类型识别,得到类型识别结果;S2: Perform type identification on the object to be matched and obtain the type identification result;
S3:根据所述类型识别结果,从预设的候选对象库中确定候选对象集;S3: Determine the candidate object set from the preset candidate object library according to the type recognition result;
S4:根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;S4: Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
S5:对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;S5: Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
S6:对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;S6: Calculate the similarity between the fusion feature and the candidate object feature corresponding to the same candidate object, and obtain the single object similarity;
S7:根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。S7: Determine the target matching result corresponding to the object to be matched according to the similarity of each single object and the candidate object set.
本实施例通过对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。通过首先对待匹配对象和候选对象进行融合特征提取,然后对融合特征与候选对象特征进行匹配操作,避免图像特征和文本特征的直接匹配操作,而且采用融合特征进行文本图像匹配可以增加匹配的精度,提高了文本图像匹配的准确性。In this embodiment, a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object. Perform fusion feature extraction on each candidate object in the set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined. By first extracting fusion features of the object to be matched and the candidate object, and then performing a matching operation on the fused features and candidate object features, the direct matching operation of image features and text features is avoided, and the use of fusion features for text-image matching can increase the matching accuracy. Improved text-image matching accuracy.
对于S1,可以获取用户输入的待匹配对象,也可以从数据库中获取待匹配对象,还可以从第三方应用中获取待匹配对象。For S1, you can obtain the object to be matched input by the user, the object to be matched from the database, or the object to be matched from a third-party application.
待匹配对象,是需要进行文本图像匹配的对象。The object to be matched is the object that needs to be matched with text and images.
待匹配对象是一段文本或一张图像。The object to be matched is a piece of text or an image.
对于S2,对所述待匹配对象进行类型识别,以实现判断待匹配对象是文本或图像。For S2, perform type identification on the object to be matched to determine whether the object to be matched is text or an image.
类型识别结果的值只有一个。类型识别结果的取值范围包括:文本类型和图像类型。There is only one value for the type identification result. The value range of type identification results includes: text type and image type.
对于S3,将所述类型识别结果,在候选对象库中的进行类型标识匹配,将匹配到的类型标识在候选对象库中对应的子库标识对于的字库作为候选对象集。For S3, the type recognition result is matched with type identifiers in the candidate object library, and the font library corresponding to the sub-library identifier corresponding to the matched type identifier in the candidate object library is used as a candidate object set.
候选对象库包括:类型标识和子库标识。The candidate object library includes: type identifier and sub-library identifier.
对于S4,根据所述待匹配对象和所述候选对象集中的每个候选对象进行文本与图像之间的中间特征的提取,将提取到的中间特征作为融合特征。For S4, intermediate features between the text and the image are extracted according to the object to be matched and each candidate object in the candidate object set, and the extracted intermediate features are used as fusion features.
其中,根据所述待匹配对象的编码和所述候选对象集中的每个候选对象的编码进行特征提取,将提取到的特征作为融合特征。Wherein, feature extraction is performed based on the coding of the object to be matched and the coding of each candidate object in the candidate object set, and the extracted features are used as fusion features.
其中,融合特征的数量与所述候选对象集中的候选对象的数量相同。The number of fused features is the same as the number of candidate objects in the candidate object set.
对于S5,对所述候选对象集中的每个所述候选对象进行特征提取,将提取到的特征作为候选对象特征,可以理解的是,候选对象特征与候选对象一一对应。For S5, feature extraction is performed on each candidate object in the candidate object set, and the extracted features are used as candidate object features. It can be understood that the candidate object features correspond to the candidate objects one-to-one.
对于S6,对同一所述候选对象对应的所述融合特征和所述候选对象特征进行余弦相似度或者欧式距离计算,将计算得到的数据作为一个单对象相似度。For S6, perform cosine similarity or Euclidean distance calculation on the fusion feature and the candidate object feature corresponding to the same candidate object, and use the calculated data as a single object similarity.
也就是说,单对象相似度的数量与所述候选对象集中的候选对象的数量相同。That is, the number of single-object similarities is the same as the number of candidate objects in the candidate object set.
对于S7,当所述单对象相似度是余弦相似度时,从各个所述单对象相似度中找出值为最大的所述单对象相似度,将找出的所述单对象相似度在所述候选对象集对应的候选对象作为与所述待匹配对象对应的目标匹配结果的命中对象;当所述单对象相似度是欧氏距离时,从各个所述单对象相似度中找出值为最小的所述单对象相似度,将找出的所述单对象相似度在所述候选对象集对应的候选对象作为与所述待匹配对象对应的目标匹配结果的命中对象。For S7, when the single-object similarity is cosine similarity, find the single-object similarity with the largest value from each of the single-object similarities, and place the found single-object similarity at that location. The candidate object corresponding to the candidate object set is used as the hit object of the target matching result corresponding to the object to be matched; when the single object similarity is the Euclidean distance, the value found from each of the single object similarities is The candidate object corresponding to the single object similarity found in the candidate object set is the smallest single object similarity as the hit object of the target matching result corresponding to the object to be matched.
在一个实施例中,上述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:In one embodiment, the above-mentioned step of performing type identification on the object to be matched and obtaining the type identification result includes:
S21:将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;S21: Input the object to be matched into the preset text image classification model for classification prediction, and obtain the classification prediction result;
S22:当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;S22: When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, determine that the type identification result is a text type;
S23:当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。S23: When the vector element corresponding to the text label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result, determine that the type identification result is an image type.
本实施例通过文本图像分类模型进行分类预测,提高了分类预测的结果,从而提高了文本图像匹配的准确性。This embodiment uses a text image classification model to perform classification prediction, thereby improving the results of classification prediction, thereby improving the accuracy of text image matching.
对于S21,将所述待匹配对象输入预设的文本图像分类模型进行分类预测,将分类预测得到的数据作为分类预测结果。For S21, the object to be matched is input into a preset text image classification model for classification prediction, and the data obtained by classification prediction is used as the classification prediction result.
文本图像分类模型可以采用二分类器。The text image classification model can use a binary classifier.
分类预测结果是一个向量,该向量中有两个向量元素,两个向量元素分别对应文本标签和图像标签,该向量中的向量元素是概率值。The classification prediction result is a vector. There are two vector elements in this vector. The two vector elements correspond to text labels and image labels respectively. The vector elements in this vector are probability values.
对于S22,当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,意味着文本标签对应的向量元素最大,此时所述待匹配对象是一段文本,因此确定所述类型识别结果为文本类型。For S22, when the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it means that the vector element corresponding to the text label is the largest, and at this time, the waiting The matching object is a piece of text, so the type recognition result is determined to be a text type.
对于S23,当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,意味着图像标签对应的向量元素最大,此时所述待匹配对象是一张图像,因此确定所述类型识别结果为图像类型。For S23, when the vector element corresponding to the text label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result, it means that the vector element corresponding to the image label is the largest, so When the object to be matched is an image, it is determined that the type recognition result is an image type.
在一个实施例中,上述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:In one embodiment, the above-mentioned step of determining a candidate object set from a preset candidate object library based on the type recognition result includes:
S31:当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;S31: When the type recognition result is a text type, use the image sub-library in the candidate object library as the candidate object set;
S32:当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。S32: When the type recognition result is an image type, use the text sub-base in the candidate object library as the candidate object set.
本实施例在所述类型识别结果为文本类型时将图像子库作为所述候选对象集,在所述类型识别结果为图像类型时将文本子库作为所述候选对象集,从而为融合特征的生成和文本图像匹配提供了基础。In this embodiment, when the type recognition result is a text type, the image sub-library is used as the candidate object set, and when the type recognition result is an image type, the text sub-library is used as the candidate object set, thereby providing a fusion feature. Provides the basis for generation and text-image matching.
对于S31,当所述类型识别结果为文本类型时,意味着述待匹配对象是一段文本,因此将所述候选对象库中的文本类型对应的子库标识对应的图像子库作为所述候选对象集,此时所述候选对象集中的候选对象是图像。For S31, when the type recognition result is a text type, it means that the object to be matched is a piece of text, so the image sub-library corresponding to the sub-library identifier corresponding to the text type in the candidate object library is used as the candidate object. set, at this time the candidate object in the candidate object set is an image.
对于S32,当所述类型识别结果为图像类型时,意味着述待匹配对象是一张图像,因此将所述候选对象库中的图像类型对应的子库标识对应的图像子库作为所述候选对象集,此时所述候选对象集中的候选对象是文本。For S32, when the type recognition result is an image type, it means that the object to be matched is an image, so the image sub-library corresponding to the sub-library identifier corresponding to the image type in the candidate object library is used as the candidate Object set, in which case the candidate object in the candidate object set is text.
在一个实施例中,上述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:In one embodiment, the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes:
S41:将所述候选对象集中的任一个所述候选对象作为目标对象;S41: Use any candidate object in the candidate object set as a target object;
S42:将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;S42: Enter the target object into a coding model corresponding to the type of the candidate object set for coding to obtain the first coding;
S43:将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;S43: Enter the object to be matched into the encoding model corresponding to the type recognition result for encoding to obtain a second encoding;
S44:将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;S44: Splice the first code and the second code in dimensions to obtain a fusion code;
S45:将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。S45: Input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the fusion feature corresponding to the target object.
本实施例先分别对所述待匹配对象和所述候选对象进行编码及维度拼接,然后将维度拼接的结果输入融合特征提取模型进行特征提取,从而提取到了图像和文本之间的中间特征,为对融合特征与候选对象特征进行匹配操作提供了基础。In this embodiment, the object to be matched and the candidate object are first encoded and dimensionally spliced respectively, and then the result of dimension splicing is input into the fusion feature extraction model for feature extraction, thereby extracting the intermediate features between the image and the text, as It provides a basis for matching fusion features and candidate object features.
对于S42,当所述候选对象集的类型为文本类型时,则将所述目标对象输入与文本类型对应的编码模型中进行编码,将编码得到数据作为第一编码;当所述候选对象集的类型为图像类型时,则将所述目标对象输入与图像类型对应的编码模型中进行编码,将编码得到数据作为第一编码。For S42, when the type of the candidate object set is a text type, the target object is input into the coding model corresponding to the text type for encoding, and the coded data is used as the first encoding; when the candidate object set is When the type is an image type, the target object is input into the encoding model corresponding to the image type for encoding, and the encoded data is used as the first encoding.
可选的,所述编码模型采用全连接层。因编码模型是较浅的信息编码,因此将保留目标对象中的大量原始信息。Optionally, the coding model adopts a fully connected layer. Because the encoding model is a shallower information encoding, a large amount of original information in the target object will be retained.
可以理解的是,所述编码模型还可以采用其他可以进行编码的模型,在此不做限定。It can be understood that the encoding model can also adopt other encoding models, which are not limited here.
对于S43,当所述类型识别结果为文本类型时,则将所述待匹配对象输入与文本类型对应的编码模型中进行编码,将编码得到数据作为第二编码;当所述类型识别结果为图像类型时,则将所述待匹配对象输入与图像类型对应的编码模型中进行编码,将编码得到数据作为第二编码。For S43, when the type recognition result is a text type, the object to be matched is input into the coding model corresponding to the text type for coding, and the coded data is used as the second code; when the type recognition result is an image type, the object to be matched is input into a coding model corresponding to the image type for coding, and the coded data is used as the second coding.
对于S44,可选的,采用先文本再图像的顺序,将所述第一编码和所述第二编码,在维度上进行拼接,将拼接得到的数据作为融合编码,此时的融合编码在维度上依次是文本编码、图像编码。For S44, optionally, the first code and the second code are spliced in the dimension in the order of text first and then image, and the spliced data is used as the fusion code. At this time, the fusion code is in the dimension The top order is text encoding and image encoding.
可选的,采用先图像再文本的顺序,将所述第一编码和所述第二编码,在维度上进行拼接,将拼接得到的数据作为融合编码,此时的融合编码在维度上依次是图像编码、文本编码。Optionally, the first code and the second code are spliced in the dimension in the order of image first and then text, and the data obtained by splicing is used as the fusion code. At this time, the fusion code in the dimension is sequentially: Image encoding, text encoding.
对于S45,将所述融合编码输入预设的融合特征提取模型进行特征提取,将提取到的特征作为与所述目标对象对应的所述融合特征。For S45, the fusion code is input into a preset fusion feature extraction model for feature extraction, and the extracted features are used as the fusion features corresponding to the target object.
融合特征提取模型是基于Rresnet50网络或Unet网络训练得到的模型。Rresnet50网络,是深度残差网络。Unet网络,是语义分割网络。The fusion feature extraction model is a model trained based on the Rresnet50 network or Unet network. Rresnet50 network is a deep residual network. Unet network is a semantic segmentation network.
可以理解的是,重复步骤S41至步骤S45,即可确定所述候选对象集中的每个候选对象对应的所述融合特征。It can be understood that by repeating steps S41 to S45, the fusion feature corresponding to each candidate object in the candidate object set can be determined.
在一个实施例中,上述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:In one embodiment, the above-mentioned step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes:
S51:将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。S51: Enter each candidate object in the candidate object set into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and obtain that each candidate object corresponds to the candidate object. feature.
本实施例采用与所述候选对象集的类型对应的单对象特征提取模型进行特征提取,从而提高了提取的特征的准确性,提高了文本图像匹配的准确性。This embodiment uses a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, thereby improving the accuracy of the extracted features and improving the accuracy of text image matching.
对于S51,将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,将提取到的特征作为一个所述候选对象特征。For S51, each candidate object in the candidate object set is input into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and the extracted features are used as one candidate object feature. .
当与所述候选对象集的类型是文本类型时,与所述候选对象集的类型对应的单对象特征提取模型,是采用多个文本训练样本,对LSTM网络进行训练得到的模型;当与所述候选对象集的类型是图像类型时,与所述候选对象集的类型对应的单对象特征提取模型,是采用多个图像训练样本,对Rresnet50网络或Unet网络进行训练得到的模型。When the type of the candidate object set is a text type, the single object feature extraction model corresponding to the type of the candidate object set is a model obtained by training the LSTM network using multiple text training samples; when the type of the candidate object set is a text type, When the type of the candidate object set is an image type, the single object feature extraction model corresponding to the type of the candidate object set is a model obtained by training the Rresnet50 network or the Unet network using multiple image training samples.
LSTM网络,是指长短期记忆人工神经网络。LSTM network refers to long short-term memory artificial neural network.
文本训练样本中包括:文本样本和文本特征标定数据。Text training samples include: text samples and text feature calibration data.
图像训练样本中包括:图像样本和图像特征标定数据。Image training samples include: image samples and image feature calibration data.
在一个实施例中,上述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:In one embodiment, the above-mentioned step of calculating the similarity between the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes:
S61:将所述候选对象集中的任一个所述候选对象作为待计算对象;S61: Use any candidate object in the candidate object set as an object to be calculated;
S62:将所述待计算对象对应的所述融合特征作为第一特征;S62: Use the fusion feature corresponding to the object to be calculated as the first feature;
S63:将所述待计算对象对应的所述候选对象特征作为第二特征;S63: Use the candidate object feature corresponding to the object to be calculated as the second feature;
S64:对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。S64: Perform cosine similarity calculation on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
本实施例采用余弦相似度进行相似度计算,因余弦相似度倾向给出更优解,因此进一步提高了文本图像匹配的准确性。This embodiment uses cosine similarity to perform similarity calculation. Since cosine similarity tends to provide a better solution, the accuracy of text-image matching is further improved.
对于S64,所述第一特征与所述第二特征,是同一个所述候选对象对应的特征,因此,对所述第一特征与所述第二特征进行余弦相似度计算,将计算得到的余弦相似度作为所述待计算对象对应的所述单对象相似度。For S64, the first feature and the second feature are features corresponding to the same candidate object. Therefore, cosine similarity calculation is performed on the first feature and the second feature, and the calculated The cosine similarity is used as the single object similarity corresponding to the object to be calculated.
重复步骤S61至S64,即可确定所述候选对象集中的每个所述候选对象对应的单对象相似度。By repeating steps S61 to S64, the single object similarity corresponding to each candidate object in the candidate object set can be determined.
在一个实施例中,上述单对象相似度是余弦相似度,所述根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果的步骤,包括:In one embodiment, the above-mentioned single object similarity is cosine similarity, and the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes:
S71:从各个所述单对象相似度中找出值为最大的所述单对象相似度,作为目标相似度;S71: Find the single-object similarity with the largest value from each of the single-object similarities, and use it as the target similarity;
S72:判断所述目标相似度是否大于预设的相似度阈值;S72: Determine whether the target similarity is greater than the preset similarity threshold;
S73:若是,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象;S73: If yes, determine that the target matching result is successful, and use the candidate object corresponding to the target similarity in the candidate object set as the hit object of the target matching result;
S74:若否,则确定所述目标匹配结果的结果为失败。S74: If not, determine that the target matching result is a failure.
本实施例通过将大于预设的相似度阈值的值为最大的所述单对象相似度,在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象,从而进一步提高了确定的目标匹配结果的准确性。This embodiment further improves the accuracy of the single object similarity by taking the value greater than the preset similarity threshold as the maximum single object similarity and the corresponding candidate object in the candidate object set as the hit object of the target matching result. Determine the accuracy of target matching results.
对于S71,从各个所述单对象相似度中找出值为最大的所述单对象相似度,将找出的所述单对象相似度作为目标相似度。For S71, the single object similarity with the largest value is found from each of the single object similarities, and the found single object similarity is used as the target similarity.
对于S73,若是,也就是所述目标相似度大于预设的相似度阈值,意味着存在符合要求的所述单对象相似度,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象。For S73, if yes, that is, the target similarity is greater than the preset similarity threshold, which means that there is a single object similarity that meets the requirements, then it is determined that the result of the target matching result is successful, and the target is The candidate object corresponding to the similarity in the candidate object set is used as the hit object of the target matching result.
对于S74,若否,也就是所述目标相似度小于或等于预设的相似度阈值,意味着不存在符合要求的所述单对象相似度,则确定所述目标匹配结果的结果为失败。For S74, if not, that is, the target similarity is less than or equal to the preset similarity threshold, which means that there is no single object similarity that meets the requirements, then it is determined that the target matching result is a failure.
参照图2,本申请还提出了一种文本图像匹配装置,所述装置包括:Referring to Figure 2, this application also proposes a text image matching device, which includes:
数据获取模块100,用于获取待匹配对象;The data acquisition module 100 is used to acquire objects to be matched;
类型识别结果确定模块200,用于对所述待匹配对象进行类型识别,得到类型识别结果;The type recognition result determination module 200 is used to perform type recognition on the object to be matched and obtain the type recognition result;
候选对象集确定模块300,用于根据所述类型识别结果,从预设的候选对象库中确定候选对象集;The candidate object set determination module 300 is used to determine the candidate object set from the preset candidate object library according to the type recognition result;
融合特征提取模块400,用于根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;The fusion feature extraction module 400 is configured to perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
候选对象特征确定模块500,用于对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;The candidate object feature determination module 500 is used to perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
单对象相似度确定模块600,用于对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;The single object similarity determination module 600 is used to calculate the similarity between the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity;
目标匹配结果确定模块700,用于根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。The target matching result determination module 700 is configured to determine the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set.
本实施例通过对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。通过首先对待匹配对象和候选对象进行融合特征提取,然后对融合特征与候选对象特征进行匹配操作,避免图像特征和文本特征的直接匹配操作,而且采用融合特征进行文本图像匹配可以增加匹配的精度,提高了文本图像匹配的准确性。In this embodiment, a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object. Perform fusion feature extraction on each candidate object in the set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined. By first extracting fusion features of the object to be matched and the candidate object, and then performing a matching operation on the fused features and candidate object features, the direct matching operation of image features and text features is avoided, and the use of fusion features for text-image matching can increase the matching accuracy. Improved text-image matching accuracy.
在一个实施例中,上述类型识别结果确定模块200包括:分类预测结果确定子模块、第一结果确定子模块和第二结果确定子模块;In one embodiment, the above-mentioned type recognition result determination module 200 includes: a classification prediction result determination sub-module, a first result determination sub-module and a second result determination sub-module;
所述分类预测结果确定子模块,用于将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;The classification prediction result determination sub-module is used to input the object to be matched into a preset text image classification model for classification prediction to obtain a classification prediction result;
所述第一结果确定子模块,用于当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;The first result determination submodule is configured to determine that the type identification result is when the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result. text type;
所述第二结果确定子模块,用于当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。The second result determination submodule is configured to determine the vector element corresponding to the text label in the classification prediction result when the vector element corresponding to the image label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result. The type recognition result is the image type.
在一个实施例中,上述候选对象集确定模块300包括:第一候选对象集确定子模块和第二候选对象集确定子模块;In one embodiment, the above-mentioned candidate object set determination module 300 includes: a first candidate object set determination sub-module and a second candidate object set determination sub-module;
所述第一候选对象集确定子模块,用于当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;The first candidate object set determination sub-module is configured to use the image sub-library in the candidate object library as the candidate object set when the type recognition result is a text type;
所述第二候选对象集确定子模块,用于当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。The second candidate object set determination sub-module is configured to use the text sub-library in the candidate object library as the candidate object set when the type recognition result is an image type.
在一个实施例中,上述融合特征提取模块400包括:融合特征提取子模块;In one embodiment, the above-mentioned fusion feature extraction module 400 includes: a fusion feature extraction sub-module;
所述融合特征提取子模块,用于将所述候选对象集中的任一个所述候选对象作为目标对象,将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码,将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码,将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码,将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。The fusion feature extraction sub-module is used to use any of the candidate objects in the candidate object set as a target object, and input the target object into a coding model corresponding to the type of the candidate object set for encoding, to obtain In the first encoding, the object to be matched is input into the encoding model corresponding to the type recognition result and encoded to obtain the second encoding. The first encoding and the second encoding are spliced in dimensions. , obtain the fusion code, input the fusion code into a preset fusion feature extraction model for feature extraction, and obtain the fusion feature corresponding to the target object.
在一个实施例中,上述候选对象特征确定模块500包括:候选对象特征确定子模块;In one embodiment, the above-mentioned candidate object feature determination module 500 includes: a candidate object feature determination sub-module;
所述候选对象特征确定子模块,用于将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。The candidate object feature determination submodule is used to input each candidate object in the candidate object set into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and obtain each candidate object. The candidate object corresponds to the candidate object feature.
在一个实施例中,上述单对象相似度确定模块600包括:余弦相似度计算计算子模块;In one embodiment, the above-mentioned single object similarity determination module 600 includes: cosine similarity calculation sub-module;
所述余弦相似度计算计算子模块,用于将所述候选对象集中的任一个所述候选对象作为待计算对象,将所述待计算对象对应的所述融合特征作为第一特征,将所述待计算对象对应的所述候选对象特征作为第二特征,对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。The cosine similarity calculation sub-module is used to use any of the candidate objects in the candidate object set as an object to be calculated, the fusion feature corresponding to the object to be calculated as the first feature, and the The candidate object feature corresponding to the object to be calculated is used as the second feature, and cosine similarity calculation is performed on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
在一个实施例中,上述目标匹配结果确定模块700包括:相似度筛选子模块和目标匹配结果确定子模块;In one embodiment, the above-mentioned target matching result determination module 700 includes: a similarity screening sub-module and a target matching result determination sub-module;
所述相似度筛选子模块,用于从各个所述单对象相似度中找出值为最大的所述单对象相似度,作为目标相似度;The similarity screening sub-module is used to find the single object similarity with the largest value from each of the single object similarities as the target similarity;
所述目标匹配结果确定子模块,用于判断所述目标相似度是否大于预设的相似度阈值,所述第一匹配结果确定子模块,用于若是,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象,若否,则确定所述目标匹配结果的结果为失败。The target matching result determination sub-module is used to determine whether the target similarity is greater than a preset similarity threshold, and the first matching result determination sub-module is used to determine whether the target matching result is If successful, the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result. If not, the result of the target matching result is determined to be failure.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于储存文本图像匹配方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述任一实施例中文本图像匹配方法。所述文本图像匹配方法,包括:获取待匹配对象;对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。Referring to FIG. 3 , an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer device includes a processor, memory, network interface, and database connected through a system bus. Among them, the processor designed by the computer is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operation of operating systems and computer programs in non-volatile storage media. The database of the computer device is used to store data such as text image matching methods. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer program, when executed by the processor, implements the text image matching method in any of the above embodiments. The text image matching method includes: obtaining an object to be matched; performing type recognition on the object to be matched to obtain a type recognition result; determining a candidate object set from a preset candidate object library according to the type recognition result; Performing fusion feature extraction on the object to be matched and each candidate object in the candidate object set; performing feature extraction on each candidate object in the candidate object set to obtain candidate object features; corresponding to the same candidate object Perform similarity calculations on the fused features and the candidate object features to obtain a single object similarity; and determine a target matching result corresponding to the object to be matched based on each of the single object similarities and the candidate object set.
本实施例通过对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。通过首先对待匹配对象和候选对象进行融合特征提取,然后对融合特征与候选对象特征进行匹配操作,避免图像特征和文本特征的直接匹配操作,而且采用融合特征进行文本图像匹配可以增加匹配的精度,提高了文本图像匹配的准确性。In this embodiment, a type identification result is obtained by performing type identification on the object to be matched; a candidate object set is determined from a preset candidate object library according to the type identification result; and a candidate object set is determined according to the object to be matched and the candidate object. Perform fusion feature extraction on each candidate object in the set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion feature and candidate object corresponding to the same candidate object Similarity calculation is performed on the features to obtain a single object similarity; based on each single object similarity and the candidate object set, a target matching result corresponding to the object to be matched is determined. By first extracting fusion features of the object to be matched and the candidate object, and then performing a matching operation on the fused features and candidate object features, the direct matching operation of image features and text features is avoided, and the use of fusion features for text-image matching can increase the matching accuracy. Improved text-image matching accuracy.
在一个实施例中,上述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。In one embodiment, the above step of performing type identification on the object to be matched and obtaining the type identification result includes: inputting the object to be matched into a preset text image classification model for classification prediction and obtaining the classification prediction result; when When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type recognition result is a text type; when the AND in the classification prediction result When the vector element corresponding to the text label is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
在一个实施例中,上述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。In one embodiment, the above step of determining a candidate object set from a preset candidate object library based on the type identification result includes: when the type identification result is a text type, converting the candidate object set in the candidate object library The image sub-library is used as the candidate object set; when the type recognition result is an image type, the text sub-library in the candidate object library is used as the candidate object set.
在一个实施例中,上述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:将所述候选对象集中的任一个所述候选对象作为目标对象;将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。In one embodiment, the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes: using any one of the candidate objects in the candidate object set as a target object; The target object is input into the encoding model corresponding to the type of the candidate object set for encoding to obtain the first encoding; the object to be matched is input into the encoding model corresponding to the type recognition result for encoding, Obtain the second code; splice the first code and the second code in dimensions to obtain a fusion code; input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the target The fused features corresponding to the object.
在一个实施例中,上述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。In one embodiment, the above step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes: inputting each candidate object in the candidate object set with the corresponding Feature extraction is performed in a single object feature extraction model corresponding to the type of the candidate object set, and each candidate object corresponding to the candidate object feature is obtained.
在一个实施例中,上述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:将所述候选对象集中的任一个所述候选对象作为待计算对象;将所述待计算对象对应的所述融合特征作为第一特征;将所述待计算对象对应的所述候选对象特征作为第二特征;对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。In one embodiment, the above-mentioned step of calculating the similarity of the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes: calculating any one of the candidate objects in the set The candidate object is used as the object to be calculated; the fusion feature corresponding to the object to be calculated is used as the first feature; the candidate object feature corresponding to the object to be calculated is used as the second feature; the first feature and The second feature performs cosine similarity calculation to obtain the single object similarity corresponding to the object to be calculated.
在一个实施例中,上述单对象相似度是余弦相似度,所述根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果的步骤,包括:从各个所述单对象相似度中找出值为最大的所述单对象相似度,作为目标相似度;判断所述目标相似度是否大于预设的相似度阈值;若是,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象;若否,则确定所述目标匹配结果的结果为失败。In one embodiment, the above-mentioned single object similarity is cosine similarity, and the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes: Find the single object similarity with the largest value from each of the single object similarities as the target similarity; determine whether the target similarity is greater than the preset similarity threshold; if so, determine that the target matches The result of the result is success, and the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result; if not, the result of the target matching result is determined to be failure.
本申请一实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,计算机程序被处理器执行时实现上述任一实施例中的文本图像匹配方法,包括步骤:获取待匹配对象;对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. A computer program is stored thereon. When the computer program is executed by a processor, the above-mentioned steps are implemented. The text image matching method in any embodiment includes the steps of: obtaining an object to be matched; performing type recognition on the object to be matched to obtain a type recognition result; and determining from a preset candidate object library according to the type recognition result. Candidate object set; perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; extract the same The fusion feature corresponding to the candidate object and the candidate object feature are subjected to similarity calculation to obtain a single object similarity; according to the similarity of each single object and the candidate object set, the corresponding object to be matched is determined. target matching results.
上述执行的文本图像匹配方法,通过对所述待匹配对象进行类型识别,得到类型识别结果;根据所述类型识别结果,从预设的候选对象库中确定候选对象集;根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。通过首先对待匹配对象和候选对象进行融合特征提取,然后对融合特征与候选对象特征进行匹配操作,避免图像特征和文本特征的直接匹配操作,而且采用融合特征进行文本图像匹配可以增加匹配的精度,提高了文本图像匹配的准确性。The text image matching method executed above obtains a type recognition result by performing type recognition on the object to be matched; according to the type recognition result, a candidate object set is determined from a preset candidate object library; according to the object to be matched Perform fusion feature extraction with each candidate object in the candidate object set; perform feature extraction on each candidate object in the candidate object set to obtain candidate object features; perform fusion features corresponding to the same candidate object Similarity calculation is performed with the candidate object characteristics to obtain a single object similarity; and a target matching result corresponding to the object to be matched is determined based on each single object similarity and the candidate object set. By first extracting fusion features of the object to be matched and the candidate object, and then performing a matching operation on the fused features and candidate object features, the direct matching operation of image features and text features is avoided, and the use of fusion features for text-image matching can increase the matching accuracy. Improved text-image matching accuracy.
在一个实施例中,上述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。In one embodiment, the above step of performing type identification on the object to be matched and obtaining the type identification result includes: inputting the object to be matched into a preset text image classification model for classification prediction and obtaining the classification prediction result; when When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type recognition result is a text type; when the AND in the classification prediction result When the vector element corresponding to the text label is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
在一个实施例中,上述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。In one embodiment, the above step of determining a candidate object set from a preset candidate object library based on the type identification result includes: when the type identification result is a text type, converting the candidate object set in the candidate object library The image sub-library is used as the candidate object set; when the type recognition result is an image type, the text sub-library in the candidate object library is used as the candidate object set.
在一个实施例中,上述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:将所述候选对象集中的任一个所述候选对象作为目标对象;将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。In one embodiment, the above step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes: using any one of the candidate objects in the candidate object set as a target object; The target object is input into the encoding model corresponding to the type of the candidate object set for encoding to obtain the first encoding; the object to be matched is input into the encoding model corresponding to the type recognition result for encoding, Obtain the second code; splice the first code and the second code in dimensions to obtain a fusion code; input the fusion code into a preset fusion feature extraction model for feature extraction to obtain the target The fused features corresponding to the object.
在一个实施例中,上述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。In one embodiment, the above step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes: inputting each candidate object in the candidate object set with the corresponding Feature extraction is performed in a single object feature extraction model corresponding to the type of the candidate object set, and each candidate object corresponding to the candidate object feature is obtained.
在一个实施例中,上述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:将所述候选对象集中的任一个所述候选对象作为待计算对象;将所述待计算对象对应的所述融合特征作为第一特征;将所述待计算对象对应的所述候选对象特征作为第二特征;对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。In one embodiment, the above-mentioned step of calculating the similarity of the fusion features and the candidate object features corresponding to the same candidate object to obtain the similarity of a single object includes: calculating any one of the candidate objects in the set The candidate object is used as the object to be calculated; the fusion feature corresponding to the object to be calculated is used as the first feature; the candidate object feature corresponding to the object to be calculated is used as the second feature; the first feature and The second feature performs cosine similarity calculation to obtain the single object similarity corresponding to the object to be calculated.
在一个实施例中,上述单对象相似度是余弦相似度,所述根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果的步骤,包括:从各个所述单对象相似度中找出值为最大的所述单对象相似度,作为目标相似度;判断所述目标相似度是否大于预设的相似度阈值;若是,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象;若否,则确定所述目标匹配结果的结果为失败。In one embodiment, the above-mentioned single object similarity is cosine similarity, and the step of determining the target matching result corresponding to the object to be matched according to each of the single object similarities and the candidate object set includes: Find the single object similarity with the largest value from each of the single object similarities as the target similarity; determine whether the target similarity is greater than the preset similarity threshold; if so, determine that the target matches The result of the result is success, and the candidate object corresponding to the target similarity in the candidate object set is used as the hit object of the target matching result; if not, the result of the target matching result is determined to be failure.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer-readable storage. In the media, when executed, the computer program may include the processes of the above method embodiments. Any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprising" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, device, article or method that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, apparatus, article or method. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, apparatus, article or method that includes that element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the description and drawings of the present application, or directly or indirectly used in other related The technical fields are all equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种文本图像匹配方法,其中,所述方法包括:A text image matching method, wherein the method includes:
    获取待匹配对象;Get the object to be matched;
    对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
    根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
    根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
    对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
    对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
    根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  2. 根据权利要求1所述的文本图像匹配方法,其中,所述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:The text image matching method according to claim 1, wherein the step of performing type recognition on the object to be matched and obtaining the type recognition result includes:
    将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;Input the object to be matched into a preset text image classification model for classification prediction, and obtain a classification prediction result;
    当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is a text type;
    当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。When the vector element corresponding to the text label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
  3. 根据权利要求1所述的文本图像匹配方法,其中,所述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:The text image matching method according to claim 1, wherein the step of determining a candidate object set from a preset candidate object library according to the type recognition result includes:
    当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;When the type recognition result is a text type, use the image sub-library in the candidate object library as the candidate object set;
    当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。When the type recognition result is an image type, the text sub-base in the candidate object library is used as the candidate object set.
  4. 根据权利要求1所述的文本图像匹配方法,其中,所述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:The text image matching method according to claim 1, wherein the step of extracting fusion features based on the object to be matched and each candidate object in the candidate object set includes:
    将所述候选对象集中的任一个所述候选对象作为目标对象;Use any candidate object in the candidate object set as a target object;
    将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;Enter the target object into a coding model corresponding to the type of the candidate object set for coding to obtain a first coding;
    将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;Enter the object to be matched into the encoding model corresponding to the type recognition result for encoding to obtain a second encoding;
    将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;Splice the first code and the second code in dimensions to obtain a fusion code;
    将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。The fusion code is input into a preset fusion feature extraction model for feature extraction to obtain the fusion feature corresponding to the target object.
  5. 根据权利要求1所述的文本图像匹配方法,其中,所述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:The text image matching method according to claim 1, wherein the step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes:
    将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。Each candidate object in the candidate object set is input into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and each candidate object is obtained to correspond to the candidate object feature.
  6. 根据权利要求1所述的文本图像匹配方法,其中,所述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:The text image matching method according to claim 1, wherein the step of calculating the similarity of the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity includes:
    将所述候选对象集中的任一个所述候选对象作为待计算对象;Use any one of the candidate objects in the candidate object set as an object to be calculated;
    将所述待计算对象对应的所述融合特征作为第一特征;Use the fusion feature corresponding to the object to be calculated as the first feature;
    将所述待计算对象对应的所述候选对象特征作为第二特征;Use the candidate object feature corresponding to the object to be calculated as the second feature;
    对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。Cosine similarity calculation is performed on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
  7. 根据权利要求1所述的文本图像匹配方法,其中,所述单对象相似度是余弦相似度,所述根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果的步骤,包括:The text image matching method according to claim 1, wherein the single object similarity is a cosine similarity, and the method corresponding to the object to be matched is determined based on each of the single object similarity and the candidate object set. The steps for target matching results include:
    从各个所述单对象相似度中找出值为最大的所述单对象相似度,作为目标相似度;Find the single object similarity with the largest value from each of the single object similarities as the target similarity;
    判断所述目标相似度是否大于预设的相似度阈值;Determine whether the target similarity is greater than a preset similarity threshold;
    若是,则确定所述目标匹配结果的结果为成功,并且将所述目标相似度在所述候选对象集中对应的所述候选对象作为所述目标匹配结果的命中对象;If so, determine that the result of the target matching result is successful, and use the candidate object corresponding to the target similarity in the candidate object set as the hit object of the target matching result;
    若否,则确定所述目标匹配结果的结果为失败。If not, the result of the target matching result is determined to be failure.
  8. 一种文本图像匹配装置,其中,所述装置包括:A text image matching device, wherein the device includes:
    数据获取模块,用于获取待匹配对象;Data acquisition module, used to obtain objects to be matched;
    类型识别结果确定模块,用于对所述待匹配对象进行类型识别,得到类型识别结果;A type recognition result determination module is used to perform type recognition on the object to be matched and obtain a type recognition result;
    候选对象集确定模块,用于根据所述类型识别结果,从预设的候选对象库中确定候选对象集;A candidate object set determination module, configured to determine a candidate object set from a preset candidate object library according to the type recognition result;
    融合特征提取模块,用于根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;A fusion feature extraction module, configured to perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
    候选对象特征确定模块,用于对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;A candidate object feature determination module, configured to perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
    单对象相似度确定模块,用于对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;A single object similarity determination module, used to calculate the similarity between the fusion features corresponding to the same candidate object and the candidate object features to obtain the single object similarity;
    目标匹配结果确定模块,用于根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。A target matching result determination module, configured to determine a target matching result corresponding to the object to be matched based on each of the single object similarities and the candidate object set.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种文本图像匹配方法,所述方法包括:A computer device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements a text image matching method. The method includes:
    获取待匹配对象;Get the object to be matched;
    对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
    根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
    根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
    对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
    对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
    根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  10. 根据权利要求9所述的计算机设备,其中,所述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:The computer device according to claim 9, wherein the step of performing type identification on the object to be matched and obtaining a type identification result includes:
    将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;Input the object to be matched into a preset text image classification model for classification prediction, and obtain a classification prediction result;
    当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is a text type;
    当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。When the vector element corresponding to the text label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
  11. 根据权利要求9所述的计算机设备,其中,所述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:The computer device according to claim 9, wherein the step of determining a candidate object set from a preset candidate object library according to the type recognition result includes:
    当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;When the type recognition result is a text type, use the image sub-library in the candidate object library as the candidate object set;
    当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。When the type recognition result is an image type, the text sub-base in the candidate object library is used as the candidate object set.
  12. 根据权利要求9所述的计算机设备,其中,所述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:The computer device according to claim 9, wherein the step of extracting fusion features based on the object to be matched and each candidate object in the candidate object set includes:
    将所述候选对象集中的任一个所述候选对象作为目标对象;Use any candidate object in the candidate object set as a target object;
    将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;Enter the target object into a coding model corresponding to the type of the candidate object set for coding to obtain a first coding;
    将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;Enter the object to be matched into the encoding model corresponding to the type recognition result for encoding to obtain a second encoding;
    将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;Splice the first code and the second code in dimensions to obtain a fusion code;
    将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。The fusion code is input into a preset fusion feature extraction model for feature extraction to obtain the fusion feature corresponding to the target object.
  13. 根据权利要求9所述的计算机设备,其中,所述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:The computer device according to claim 9, wherein the step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes:
    将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。Each candidate object in the candidate object set is input into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and each candidate object is obtained to correspond to the candidate object feature.
  14. 根据权利要求9所述的计算机设备,其中,所述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:The computer device according to claim 9, wherein the step of performing similarity calculation on the fusion feature corresponding to the same candidate object and the candidate object feature to obtain a single object similarity includes:
    将所述候选对象集中的任一个所述候选对象作为待计算对象;Use any one of the candidate objects in the candidate object set as an object to be calculated;
    将所述待计算对象对应的所述融合特征作为第一特征;Use the fusion feature corresponding to the object to be calculated as the first feature;
    将所述待计算对象对应的所述候选对象特征作为第二特征;Use the candidate object feature corresponding to the object to be calculated as the second feature;
    对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。Cosine similarity calculation is performed on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种文本图像匹配方法,所述方法包括:A computer-readable storage medium with a computer program stored thereon, wherein when the computer program is executed by a processor, a text-image matching method is implemented, and the method includes:
    获取待匹配对象;Get the object to be matched;
    对所述待匹配对象进行类型识别,得到类型识别结果;Perform type identification on the object to be matched to obtain a type identification result;
    根据所述类型识别结果,从预设的候选对象库中确定候选对象集;According to the type recognition result, determine a candidate object set from a preset candidate object library;
    根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取;Perform fusion feature extraction based on the object to be matched and each candidate object in the candidate object set;
    对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征;Perform feature extraction on each candidate object in the candidate object set to obtain candidate object features;
    对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度;Perform similarity calculation on the fusion features corresponding to the same candidate object and the candidate object features to obtain single object similarity;
    根据各个所述单对象相似度和所述候选对象集,确定与所述待匹配对象对应的目标匹配结果。According to each of the single object similarities and the candidate object set, a target matching result corresponding to the object to be matched is determined.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述对所述待匹配对象进行类型识别,得到类型识别结果的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of performing type identification on the object to be matched and obtaining the type identification result includes:
    将所述待匹配对象输入预设的文本图像分类模型进行分类预测,得到分类预测结果;Input the object to be matched into a preset text image classification model for classification prediction, and obtain a classification prediction result;
    当所述分类预测结果中的与文本标签对应的向量元素大于所述分类预测结果中的与图像标签对应的向量元素时,确定所述类型识别结果为文本类型;When the vector element corresponding to the text label in the classification prediction result is greater than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is a text type;
    当所述分类预测结果中的与所述文本标签对应的向量元素小于所述分类预测结果中的与所述图像标签对应的向量元素时,确定所述类型识别结果为图像类型。When the vector element corresponding to the text label in the classification prediction result is smaller than the vector element corresponding to the image label in the classification prediction result, it is determined that the type identification result is an image type.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述类型识别结果,从预设的候选对象库中确定候选对象集的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of determining a candidate object set from a preset candidate object library according to the type recognition result includes:
    当所述类型识别结果为文本类型时,将所述候选对象库中的图像子库作为所述候选对象集;When the type recognition result is a text type, use the image sub-library in the candidate object library as the candidate object set;
    当所述类型识别结果为图像类型时,将所述候选对象库中的文本子库作为所述候选对象集。When the type recognition result is an image type, the text sub-base in the candidate object library is used as the candidate object set.
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述待匹配对象和所述候选对象集中的每个候选对象进行融合特征提取的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of performing fusion feature extraction based on the object to be matched and each candidate object in the candidate object set includes:
    将所述候选对象集中的任一个所述候选对象作为目标对象;Use any candidate object in the candidate object set as a target object;
    将所述目标对象输入与所述候选对象集的类型对应的编码模型中进行编码,得到第一编码;Enter the target object into a coding model corresponding to the type of the candidate object set for coding to obtain a first coding;
    将所述待匹配对象输入与所述类型识别结果对应的所述编码模型中进行编码,得到第二编码;Enter the object to be matched into the encoding model corresponding to the type recognition result for encoding to obtain a second encoding;
    将所述第一编码和所述第二编码,在维度上进行拼接,得到融合编码;Splice the first code and the second code in dimensions to obtain a fusion code;
    将所述融合编码输入预设的融合特征提取模型进行特征提取,得到与所述目标对象对应的所述融合特征。The fusion code is input into a preset fusion feature extraction model for feature extraction to obtain the fusion feature corresponding to the target object.
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述对所述候选对象集中的每个所述候选对象进行特征提取,得到候选对象特征的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of performing feature extraction on each candidate object in the candidate object set to obtain candidate object features includes:
    将所述候选对象集中的每个所述候选对象分别输入与所述候选对象集的类型对应的单对象特征提取模型中进行特征提取,得到每个所述候选对象对应是所述候选对象特征。Each candidate object in the candidate object set is input into a single object feature extraction model corresponding to the type of the candidate object set for feature extraction, and each candidate object is obtained to correspond to the candidate object feature.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述对同一所述候选对象对应的所述融合特征和所述候选对象特征进行相似度计算,得到单对象相似度的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of performing similarity calculation on the fusion feature and the candidate object feature corresponding to the same candidate object to obtain a single object similarity includes:
    将所述候选对象集中的任一个所述候选对象作为待计算对象;Use any one of the candidate objects in the candidate object set as an object to be calculated;
    将所述待计算对象对应的所述融合特征作为第一特征;Use the fusion feature corresponding to the object to be calculated as the first feature;
    将所述待计算对象对应的所述候选对象特征作为第二特征;Use the candidate object feature corresponding to the object to be calculated as the second feature;
    对所述第一特征与所述第二特征进行余弦相似度计算,得到所述待计算对象对应的所述单对象相似度。Cosine similarity calculation is performed on the first feature and the second feature to obtain the single object similarity corresponding to the object to be calculated.
PCT/CN2022/090161 2022-03-16 2022-04-29 Text image matching method and apparatus, device, and storage medium WO2023173547A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210256789.1A CN114723986A (en) 2022-03-16 2022-03-16 Text image matching method, device, equipment and storage medium
CN202210256789.1 2022-03-16

Publications (1)

Publication Number Publication Date
WO2023173547A1 true WO2023173547A1 (en) 2023-09-21

Family

ID=82238459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090161 WO2023173547A1 (en) 2022-03-16 2022-04-29 Text image matching method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114723986A (en)
WO (1) WO2023173547A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966061B (en) * 2022-12-28 2023-10-24 上海帜讯信息技术股份有限公司 Disaster early warning processing method, system and device based on 5G message

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096641A (en) * 2019-03-19 2019-08-06 深圳壹账通智能科技有限公司 Picture and text matching process, device, equipment and storage medium based on image analysis
CN110147457A (en) * 2019-02-28 2019-08-20 腾讯科技(深圳)有限公司 Picture and text matching process, device, storage medium and equipment
CN110825901A (en) * 2019-11-11 2020-02-21 腾讯科技(北京)有限公司 Image-text matching method, device and equipment based on artificial intelligence and storage medium
CN112148839A (en) * 2020-09-29 2020-12-29 北京小米松果电子有限公司 Image-text matching method and device and storage medium
CN112818157A (en) * 2021-02-10 2021-05-18 浙江大学 Combined query image retrieval method based on multi-order confrontation characteristic learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2008200301A1 (en) * 2008-01-22 2009-08-06 The University Of Western Australia Image recognition
CN113392341A (en) * 2020-09-30 2021-09-14 腾讯科技(深圳)有限公司 Cover selection method, model training method, device, equipment and storage medium
CN112598575B (en) * 2020-12-22 2022-05-03 电子科技大学 Image information fusion and super-resolution reconstruction method based on feature processing
CN113656660B (en) * 2021-10-14 2022-06-28 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147457A (en) * 2019-02-28 2019-08-20 腾讯科技(深圳)有限公司 Picture and text matching process, device, storage medium and equipment
CN110096641A (en) * 2019-03-19 2019-08-06 深圳壹账通智能科技有限公司 Picture and text matching process, device, equipment and storage medium based on image analysis
CN110825901A (en) * 2019-11-11 2020-02-21 腾讯科技(北京)有限公司 Image-text matching method, device and equipment based on artificial intelligence and storage medium
CN112148839A (en) * 2020-09-29 2020-12-29 北京小米松果电子有限公司 Image-text matching method and device and storage medium
CN112818157A (en) * 2021-02-10 2021-05-18 浙江大学 Combined query image retrieval method based on multi-order confrontation characteristic learning

Also Published As

Publication number Publication date
CN114723986A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN111104495B (en) Information interaction method, device, equipment and storage medium based on intention recognition
CN111651992A (en) Named entity labeling method and device, computer equipment and storage medium
CN113704476B (en) Target event extraction data processing system
CN114245203B (en) Video editing method, device, equipment and medium based on script
CN109344242B (en) Dialogue question-answering method, device, equipment and storage medium
CN114139551A (en) Method and device for training intention recognition model and method and device for recognizing intention
CN113722461B (en) Target event extraction data processing system
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN111859916B (en) Method, device, equipment and medium for extracting key words of ancient poems and generating poems
CN112766319A (en) Dialogue intention recognition model training method and device, computer equipment and medium
CN116450796A (en) Intelligent question-answering model construction method and device
CN111695053A (en) Sequence labeling method, data processing device and readable storage medium
CN111259113A (en) Text matching method and device, computer readable storage medium and computer equipment
CN115495553A (en) Query text ordering method and device, computer equipment and storage medium
CN113723070A (en) Text similarity model training method, text similarity detection method and text similarity detection device
CN113468433A (en) Target event extraction data processing system
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN113408287A (en) Entity identification method and device, electronic equipment and storage medium
CN112270184A (en) Natural language processing method, device and storage medium
CN115203372A (en) Text intention classification method and device, computer equipment and storage medium
CN113806646A (en) Sequence labeling system and training system of sequence labeling model
WO2023173547A1 (en) Text image matching method and apparatus, device, and storage medium
CN113254575B (en) Machine reading understanding method and system based on multi-step evidence reasoning
CN117093682A (en) Intention recognition method, device, computer equipment and storage medium
CN114048753B (en) Word sense recognition model training, word sense judging method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931574

Country of ref document: EP

Kind code of ref document: A1