CN108763380B - Trademark identification retrieval method and device, computer equipment and storage medium - Google Patents

Trademark identification retrieval method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN108763380B
CN108763380B CN201810481421.9A CN201810481421A CN108763380B CN 108763380 B CN108763380 B CN 108763380B CN 201810481421 A CN201810481421 A CN 201810481421A CN 108763380 B CN108763380 B CN 108763380B
Authority
CN
China
Prior art keywords
trademark
image
minimum unit
sample
image feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810481421.9A
Other languages
Chinese (zh)
Other versions
CN108763380A (en
Inventor
徐庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Guofang Software Technology Co ltd
Xu Qing
Foshan Guofang Identification Technology Co Ltd
Original Assignee
Foshan Guofang Trademark Identification Technology Co ltd
Foshan Guofang Trademark Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Guofang Trademark Identification Technology Co ltd, Foshan Guofang Trademark Service Co ltd filed Critical Foshan Guofang Trademark Identification Technology Co ltd
Priority to CN201810481421.9A priority Critical patent/CN108763380B/en
Publication of CN108763380A publication Critical patent/CN108763380A/en
Application granted granted Critical
Publication of CN108763380B publication Critical patent/CN108763380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • G06Q50/184Intellectual property management

Abstract

The invention relates to a trademark identification retrieval method, a trademark identification retrieval device, computer equipment and a storage medium, wherein the trademark identification retrieval method comprises the following steps: converting image data of the input trademark by retrieving a sample image database to obtain an image feature descriptor and associated text information of the input trademark; respectively segmenting the image feature descriptors and the associated text information of the input trademark; obtaining image feature descriptor combination unit data and associated text information combination unit data of the input trademark according to a preset minimum unit combination rule; searching a sample trademark database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each matched preliminary search sample trademark; processing the single item matching rate to obtain the comprehensive approximation rate of the initial search sample trademark and the input trademark; the preliminary retrieval sample trademarks with the comprehensive approximation rate meeting the preset requirement are sequenced to obtain the retrieval result.

Description

Trademark identification retrieval method and device, computer equipment and storage medium
Technical Field
The present application relates to the technical field of trademark information retrieval, and in particular, to a trademark identification retrieval method, apparatus, computer device and storage medium.
Background
Trademark information retrieval is an important task in programs such as trademark registration application, trademark review, trademark authorization, trademark monitoring, and maintenance and management of trademarks by trademark registrants. The quality of trademark information retrieval directly affects the quality of program work such as trademark registration application, trademark review, trademark right, trademark monitoring, maintenance and management of trademarks by trademark registrars, and the like.
The traditional trademark retrieval technical method basically depends on subjective judgment of retrieval personnel for determining retrieval keywords, the subjective judgment made by the retrieval personnel with different professional levels generally has differences, and if the professional level of the retrieval personnel is not high enough, the missed detection quality defect of the trademark retrieval result is easily caused.
In the implementation process, the inventor finds that at least the following problems exist in the conventional technology: in the traditional method for determining the input trademark and the search keyword by trademark identification and search, the setting and the algorithm of the search keyword cannot reflect indirect associated text information of the input trademark and any local information of a trademark image, so that the missed detection of the partial information is caused, and the quality defect of the missed detection of a trademark search result is easy to occur.
Disclosure of Invention
In view of the above, it is necessary to provide a trademark identification search method, apparatus, computer device and storage medium capable of acquiring a more complete trademark search keyword and improving the recall ratio of the same or similar trademark image.
In order to achieve the above object, in one aspect, an embodiment of the present invention provides a trademark identification retrieval method, including the following steps:
converting image data of the input trademark by retrieving a sample image database to obtain an image feature descriptor and associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
respectively segmenting the image feature descriptors and the associated text information of the input trademark to obtain minimum units of the image feature descriptors and the associated text information of the input trademark; the minimum unit of the image feature descriptor is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one character or a plurality of character combinations with meanings corresponding to any text information characteristic point represented by the associated text information;
respectively combining each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
searching a sample trademark database in the sample image database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each matched primary search sample trademark, each image feature descriptor minimum unit and each associated text information minimum unit of the primary search sample trademark;
obtaining a single-item approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark, the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain the comprehensive approximation rate of the initial search sample trademark and the input trademark;
and sequencing the primary retrieval sample trademarks of which the comprehensive approximation rate meets the preset requirement to obtain a retrieval result.
On the other hand, an embodiment of the present invention further provides a trademark identification and retrieval device, including:
the conversion module is used for converting the image data of the input trademark through a retrieval sample image database to obtain an image feature descriptor and associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
the segmentation module is used for respectively segmenting the image feature descriptors and the associated text information of the input trademark to obtain minimum units of the image feature descriptors and the associated text information of the input trademark; the minimum unit of the image feature descriptor is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one character or a plurality of character combinations with meanings corresponding to any text information characteristic point represented by the associated text information;
the combination module is used for respectively combining each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
the retrieval module is used for retrieving a sample trademark database in the sample image database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each matched preliminary retrieval sample trademark, each image feature descriptor minimum unit and each associated text information minimum unit of the preliminary retrieval sample trademark;
the acquisition approximation rate module is used for acquiring a single approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark, the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain the comprehensive approximation rate of the initial search sample trademark and the input trademark;
and the sequencing module is used for sequencing the primary retrieval sample trademarks of which the comprehensive approximation rates meet the preset requirements to obtain retrieval results.
A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the trademark identification retrieval method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, carries out the steps of the above-mentioned trademark identification retrieval method.
One of the above technical solutions has the following advantages and beneficial effects:
the method comprises the steps of establishing a sample image database in advance through mass trademark and knowledge data information of the system, converting, segmenting and combining sample image data to obtain image feature descriptors, associated text information, combined unit data and any local information of a trademark image of the sample image, converting, segmenting and combining an input trademark to obtain the image feature descriptors, the associated text information, the combined unit data and any local information of the trademark image of the input trademark, and acquiring more complete trademark retrieval keywords, so that the comprehensive problems of automation, intellectualization and accuracy of retrieval keywords acquisition in trademark retrieval are effectively solved, and spanning from the conventional manual entry to the intelligent automatic identification entry is realized; the defects that in the past, manual input is not easy to exhaust local combinations formed by shape-similar characters, sound-similar characters, meaning-similar characters and graphs, search keywords are not uniform, and extracted information is easy to miss are overcome; the embodiment of the invention can effectively identify the standard text image and the non-standard text image, thereby improving the identification accuracy; the associated information of big data can be used to estimate and identify the character content, pronunciation, character combination meaning, trademark graphic element code and reflected image shape, pronunciation and feature which can be read or indirectly recorded in the input trademark or sample trademark, so as to improve the matching effect of the same or similar images in trademark identification and search, and increase the recall ratio and precision ratio of the same or similar trademark.
Drawings
FIG. 1 is a diagram showing an example of an application environment of a trademark identification retrieval method;
FIG. 2 is a first schematic flow chart diagram illustrating a brand identification retrieval method in one embodiment;
FIG. 3 is a second schematic flowchart illustrating a trademark identification retrieval method according to an embodiment
FIG. 4 is a first exemplary image of a brand identification retrieval method in one embodiment;
FIG. 5 is a second exemplary image of a brand identification retrieval method in one embodiment;
FIG. 6 is a diagram illustrating a relationship between position data of pixels on the contour line of the image of FIG. 4 and a coordinate region of a standard coordinate system with a 10 × 10 standard;
FIG. 7 is a diagram illustrating the correspondence between the position data of the pixels on the contour lines of the image in FIG. 4 and the coordinate regions of a standard coordinate system with a 20 × 20 standard;
FIG. 8 is a diagram illustrating a relationship between position data of pixels on the contour line of the image of FIG. 5 and a coordinate region of a standard coordinate system with a 10 × 10 standard;
FIG. 9 is a diagram illustrating the correspondence between the position data of the pixels on the contour lines of the image in FIG. 5 and the coordinate regions of a standard coordinate system with a 20 × 20 standard;
FIG. 10 is a block diagram showing the construction of a trademark identification retrieval means in one embodiment;
FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The content of the components of the trademark identification image can be various, the approximation factor is also various, and the traditional trademark identification search is a method for determining the input trademark and the search keyword: either a picture file or an entered text is entered. The setting and the algorithm of the search keywords are limited to the combination of the uploaded picture file and the input text content, the setting and the algorithm of the search keywords cannot reflect indirect associated text information of the input trademark and any local information of the trademark image, the indirect associated text information and any local information of the trademark image can influence the input trademark and the sample trademark to form an approximate trademark, and the missed detection of the part of information can easily cause the quality defect of the missed detection of the trademark search result.
Furthermore, the conventional technology can convert a canonical text image into a digital form of machine-editable text by an Optical Character Recognition (OCR) method, but it also has the following limitations or drawbacks: when the non-standard text image is identified, the identification accuracy is not high; the pronunciation and character combination of the text image which is not directly displayed in the text image can not be identified whether have meaning, graphic element coding and other information reflecting the image shape, pronunciation and meaning characteristics; when characters recognized from images are used as keywords for searching for images of the same or similar trademarks, although a certain searching effect can be achieved, the characters lack descriptions of other image contents, so that the missed detection of the images of the same or similar trademarks is inevitable.
Clearly, most of the traditional trademark retrieval is still in a manual entry mode, and the low working efficiency and the huge working energy consumption are obvious.
The embodiment of the invention establishes a sample image database through mass trademark and knowledge data information existing in a system, wherein the sample image database comprises a sample trademark database, a trademark constituent element sample image database, a character dictionary database and a word dictionary database, the sample image data is converted, divided and combined to obtain the image feature descriptor, the associated text information, the combined unit data and any local information of the trademark image of the sample image, the input trademark is converted, divided and combined to obtain the image feature descriptor, the associated text information, the combined unit data and any local information of the trademark image of the input trademark, the sample trademark database is searched based on the combined unit data and any local information of the trademark image to obtain a matched initial search sample trademark and an image, related to the sample trademark, the image, the associated text information, the combined unit data and the trademark image, The method comprises the steps of recording text, shape, sound and meaning characteristic information in trademarks, matching minimum unit and combination unit data, calculating single item approximation rate, mismatching rate and comprehensive approximation rate of shape, sound, meaning and search keywords between a primary search sample trademark and an input trademark, and sorting the search sample trademarks according to the comprehensive approximation rate which accords with preset single item approximation rate, mismatching rate and comprehensive approximation rate or sorting frequency which accords with preset frequency, so that the search result of the input trademark is obtained. The embodiments of the invention can improve the matching effect of the same or similar images in the trademark identification and retrieval, so as to improve the recall ratio and precision ratio of the same or similar trademarks.
Specifically, the trademark identification and retrieval method provided by the application can be applied to the application environment shown in fig. 1. The terminal 102 may communicate with the server 104 through a network, so as to obtain relevant data related to the input trademark, the sample trademark, other sample trademarks and the sample trademark database, and it should be noted that the terminal 102 may not communicate with the server 104, and the relevant data may be stored in the terminal 102 in advance and then processed; the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a trademark identification retrieval method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
step S210, converting the image data of the input trademark by retrieving a sample image database to obtain an image feature descriptor and associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
the input trademark comprises an input trademark recorded in a picture form and an input trademark recorded in a character form; the sample image comprises a sample image recorded in a picture form and a sample image recorded in a text form; namely, the trademark processed by the embodiment of the invention can be in a picture form or a character form.
The sample image comprises a trademark pattern, an appearance design pattern, artwork patterns registered by copyright, Chinese character patterns, non-Chinese character patterns and a custom image; the sample image database comprises a sample trademark database, a trademark constituent element sample image database, a character dictionary database and a word dictionary database.
Furthermore, in order to realize effective trademark retrieval, the content of the constituent elements of the trademark identification image should be considered in many aspects, the influence of trademark approximation factors is also in many aspects, and a better trademark retrieval recall ratio is obtained, a retrieval keyword and an algorithm thereof must be scientifically and reasonably determined; the data of the combination unit can represent any local information of the trademark image; the processing procedure for the sample image and the input trademark may include conversion, segmentation, combination and other processes. The processing enables the invention to obtain more complete trademark retrieval keywords.
In each embodiment of the invention, the associated text information of the sample image comprises the trademark graphic element code of the recorded sample image, the name of an object described by the sample image and the text and the shape-pronunciation meaning characteristics of characters which can be read by the sample image; the shape, pronunciation and meaning characteristics comprise a graphic shape representation form of the sample image or a writing form, pronunciation and meaning of the character, and a shape, a pronunciation and a meaning near character. (ii) a
The image feature descriptor is an image feature representation form which records the input trademark or the sample image with the same perception content or feature by adopting the same or highly similar character strings and records the input trademark or the sample image with different perception content or feature by adopting different character strings; the image feature representation is in the form of a set of one or more sets of character strings describing the image features of the input trademark or sample image.
In a specific embodiment, the step of obtaining the image feature descriptor and the associated text information of the input trademark by converting the image data of the input trademark through searching the sample image database comprises:
extracting an image feature descriptor of an input trademark recorded in a picture form; searching a sample image database based on the image feature descriptors, regarding a sample image corresponding to the matched image feature descriptors as an image which is the same as or highly similar to the image of the input trademark, and determining the image feature descriptors and the associated text information recorded in the sample image as the image feature descriptors and the associated text information of the input trademark which are input in a picture form; and the number of the first and second groups,
and searching a sample image database based on the characters of the input trademark recorded in the character form, and determining the image feature descriptors and the associated text information recorded in the sample image corresponding to the matched sample characters as the image feature descriptors and the associated text information of the input trademark recorded in the character form.
Specifically, the implementation process may include: and respectively converting the picture file of the input trademark represented in the picture form and the characters of the input trademark recorded in the character form into image feature descriptors and associated text information.
The method for processing the picture file of the input trademark represented in the picture form into the image feature descriptor and the associated text information comprises the following steps: firstly, extracting an image feature descriptor of a picture file of an input trademark represented in a picture form by adopting a method in the prior art; secondly, searching a sample image database based on an image feature descriptor by using mass data information of an existing recorded sample image to obtain a matched image feature descriptor, a sample image corresponding to the image feature descriptor and associated text information, and using the information as the image feature descriptor and the associated text information of an input trademark, wherein the associated text information comprises: the matched sample image records the trademark graphic element code in the graphic trademark, the name of the object described in the graphic trademark, the writing form, pronunciation and meaning of the trademark characters, and the characters, such as the shape characters, the pronunciation characters, the meaning characters and the like, and the shape-pronunciation characteristics.
The method for converting the characters of the input trademark recorded in the form of characters into the image characteristic descriptors and the associated text information comprises the following steps: firstly, searching a sample image database based on characters of an input trademark input in a character form as key words to obtain matched sample characters; secondly, finding out a sample image and associated text information corresponding to the matched sample characters, wherein the associated text information comprises: the matching sample image records trademark graphic element codes in a graphic trademark, object names described in the graphic trademark, image feature descriptors for representing the image, writing forms, pronunciation, meanings of trademark characters, and texts and phono-configurational features such as characters, phonetic characters, and phono-configurational features. The characters comprise Chinese characters, non-Chinese characters, numbers and symbols.
Step S220, the image feature descriptors and the associated text information of the input trademark are respectively segmented to obtain minimum units of the image feature descriptors and the associated text information of the input trademark; the minimum unit of the image feature descriptor is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one character or a plurality of character combinations with meanings corresponding to any text information characteristic point represented by the associated text information;
specifically, the segmentation processing of the image feature descriptor is to identify a minimum unit of the image feature descriptor, segment each minimum unit of the image feature descriptor, and the segmentation processing of the associated text information is to identify a minimum unit of the associated text information, segment each minimum unit of the associated text information.
Wherein, the image feature descriptor minimum unit refers to: character strings of image feature descriptors are generally used to represent feature points of an image, and one or more character strings corresponding to each feature point are referred to as image feature descriptor minimum units. In a specific embodiment, the image feature descriptor is a feature descriptor used for representing the corresponding relation between position data of any pixel point of an image contour line or an image skeleton line and a coordinate region of a standard coordinate system of any specification; the minimum unit of the image feature descriptor is position data of one or more pixel points of an image contour line or an image skeleton line corresponding to any coordinate region of a standard coordinate system with any specification;
in addition, a character string of an image feature descriptor is generally used to represent feature points of an image, and one or more character strings corresponding to each feature point are referred to as an image feature descriptor minimum unit. The image feature descriptor generally describes a plurality of image feature points, and thus the minimum unit of the image feature descriptor may be a plurality of points. The process of segmenting the image feature descriptor of the input trademark may include: and dividing each image feature point represented by the image feature descriptor, and regarding each character string or a plurality of character strings corresponding to each image feature point of the image feature descriptor as the minimum unit of the image feature descriptor.
And the associated text information minimum unit means: the words of the associated text information are generally used to represent feature points of the text information, and one or more meaningful word combinations corresponding to each feature point are called associated text information minimum units. In a specific embodiment, the associated text information minimum unit is data of a word or a vocabulary with a meaning corresponding to the associated text information represented by any word or word combination.
Specifically, the associated text information words are generally used to represent feature points of the associated text information, and one word or a combination of a plurality of meaningful words corresponding to each feature point is referred to as an associated text information minimum unit. The feature points of the associated text information are generally plural, and thus the minimum unit of the associated text information may be plural.
And the process of segmenting the associated text information of the input trademark may include: and dividing each character feature point represented by the associated text information, and taking each character or a plurality of character combinations with meanings corresponding to each character feature point of the associated text information as a minimum unit of the associated text information. The characters in the associated text information include chinese characters, non-chinese characters (i.e., foreign characters of various languages), numbers, and symbols.
Step S230, respectively combining each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
specifically, the minimum unit data is combined according to a preset combination rule to obtain combined unit data (and further embody any local information of the trademark image), where the combined unit data is a plurality of character strings corresponding to any local feature represented by the image feature descriptor or the associated text information.
The minimum unit of the isolated image feature descriptor or the minimum unit of the associated text information may not have practical application significance, but based on the embodiment of the present invention, the minimum units are combined according to a preset minimum unit combination rule to obtain combined unit data, so that the combined minimum unit combination of the image feature descriptor or the minimum unit combination of the associated text information has a specific meaning.
In a specific embodiment, the preset minimum unit combination rule may include an image feature descriptor minimum unit combination rule and an associated text information minimum unit combination rule; the image feature descriptor combination unit data includes data representing a connected component combination unit, data representing a line segment combination unit, and data representing a character string for storage; the associated text information combination unit data comprises character combination unit data, character pronunciation combination unit data, character meaning combination unit data and trademark graphic element coding combination unit data;
specifically, a preset image feature descriptor minimum unit and associated text information minimum unit combination rule can be established according to application requirements; and further, combining according to a preset image feature descriptor minimum unit and associated text information minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data.
It should be noted that the image feature descriptor minimum unit combination data acquired in the embodiment of the present invention may be used to represent a connected domain combination unit data, a line segment combination unit data, and a character string data for performing storage processing. The minimum unit combination data of the associated text information acquired by the embodiment of the invention can be used for representing the combination unit data of a word, can also represent the combination unit data of a sentence, and can also represent the combination unit data of a word or a group of characters of relatively independent parts.
Further, in a specific embodiment, the image feature descriptor minimum unit combination rule may include an image feature descriptor minimum unit combination rule of an image contour line and an image feature descriptor minimum unit combination rule of an image skeleton line;
the image feature descriptor minimum unit combination rule of the image contour line comprises the following steps: all line segments on any image contour line are determined as an image integral combination unit; confirming a closed loop line on any image contour line as a connected domain combination unit; confirming the line segment on any first preset fixed-length image contour line as a line segment combination unit; wherein, the value range of the first preset fixed length is more than or equal to 20% of the total length of the line segments on the image contour line;
the image feature descriptor minimum unit combination rule of the image skeleton line comprises the following steps: all line segments on any image skeleton line are determined as an image integral combination unit; determining uninterrupted connection lines on any image skeleton line as a connected domain combination unit; confirming the line segment on any second preset fixed-length image skeleton line as a line segment combination unit; and the value range of the second preset fixed length is more than or equal to 20% of the total length of the line segments on the image skeleton line.
The combined unit data and any local information of the trademark image obtained by the combination processing are a plurality of character strings of any local characteristic represented by the corresponding image characteristic descriptor or the associated text information.
Further, in a specific embodiment, for the minimum unit of the associated text information of the input trademark, the following steps may be taken to be combined: splitting characters of input trademarks one by one to obtain a minimum unit of associated text information; combining the minimum units of the associated text information according to the minimum unit combination rule of the associated text information to obtain data of each character combination unit; the associated text information minimum unit combination rule comprises the following steps: determining characters which are same in size, color and language and are tightly connected as a connected combined character unit; determining the connected combined character units with fixed preset character numbers as local combined units; wherein the fixed length of the preset number of characters is a value which is more than 20% of the total number of characters of the connected combined character units;
acquiring character pronunciation matched with the character combination unit data from a character dictionary database, and labeling the character pronunciation in each character combination unit data according to the character pronunciation to obtain character pronunciation combination unit data;
acquiring word combinations matched with the word combination unit data from a word dictionary database to obtain word meaning combination unit data;
and coding each trademark graphic element of the input trademark mark to be confirmed as the trademark graphic element code combination unit data. The trademark figure element code is a trademark figure element dividing tool generated according to 'establishing trademark figure element international classification Vienna protocol', and consists of a list of trademark figure elements classified according to major categories, minor categories and groups, wherein the list comprises trademark figure element numbers and trademark figure element names.
Step S240, searching a sample trademark database in the sample image database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each matched preliminary search sample trademark, each image feature descriptor minimum unit and each associated text information minimum unit of the preliminary search sample trademark;
here, the matching means that the combination unit data (i.e., the trademark image arbitrary local information) obtained by the aforementioned processing of the input trademark is the same as the combination unit data (i.e., the trademark image arbitrary local information) recorded in the sample trademark database, and the sample trademark corresponding to the recorded combination unit data can be acquired.
Specifically, the input trademark feature descriptor combination unit data and the associated text information combination unit data acquired by the method are used as search keywords, a sample trademark database in a sample image database is searched, and a matched preliminary search sample trademark, an image associated with the sample trademark, text and phonological feature information recorded in the trademark, a minimum unit and combination unit data are acquired.
Step S250, obtaining a single-item approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark, the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain the comprehensive approximation rate of the initial search sample trademark and the input trademark;
the minimum unit matching rate refers to a minimum unit proportion of the sample trademark and the input trademark which are respectively matched in aspects of shape, sound, meaning and search keywords; the minimum cell mismatch rate is a minimum cell ratio at which the sample trademark and the input trademark do not match in shape, sound, meaning, and search keyword, respectively. The minimum unit of the input brand shape, sound, meaning, and search keyword may be represented by the image feature descriptor minimum unit of the corresponding image.
In a specific embodiment, the associated text information minimum unit may include a chinese minimum unit and a non-chinese minimum unit; the single term approximation rate can comprise a Chinese single term approximation rate, a non-Chinese single term approximation rate and an image characteristic single term approximation rate;
in step S250, the step of obtaining the single-item approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary search sample trademark, and the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark includes:
acquiring the total number of Chinese minimum units, the total number of non-Chinese minimum units and the total number of image feature descriptor minimum units of an input trademark, preliminarily searching the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of a sample trademark matched with the input trademark, and preliminarily searching the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of the sample trademark not matched with the input trademark;
the Chinese minimum unit matching rate is obtained based on the following formula:
Ma1=(Ua1÷U01)×100%
wherein M isa1Representing the minimum unit match rate, U, of Chinese01Total number of Chinese minimum units, U, representing input trademarksa1Representing the Chinese minimum unit total number of the initial search sample trademark matched with the input trademark;
obtaining a non-Chinese minimum unit matching rate based on the following formula:
Ma2=(Ua2÷U02)×100%
wherein M isa2Representing non-Chinese minimum unit match rate, U02Total number of non-Chinese minimum units, U, representing input trademarksa2Representing the non-Chinese minimum unit total number of the initial search sample trademark matched with the input trademark;
obtaining the minimum unit matching rate of the image feature descriptors based on the following formula:
Ma0=(Ua0÷U00)×100%
wherein M isa0Representing the minimum unit match rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing input trademarksa0The minimum unit total number of the image feature descriptors representing the preliminary retrieval sample trademark matched with the input trademark;
the Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi1=(Uc1÷U01)×100%+(n1-1)×ω1
wherein M isi1Representing the minimum unit mismatch rate, U, in Chinese01Total number of Chinese minimum units, U, representing input trademarksc1Chinese minimum Unit count, n, representing Primary search sample Brand mismatch input Brand1Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the Chinese minimum unit combination connection line1Represents the number n of points1The weight of (a); wherein, ω is1The value range of (a) is less than or equal to 80%;
the non-Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi2=(Uc2÷U02)×100%+(n2-1)×ω2
wherein M isi2Representing the non-Chinese minimum unit mismatch rate, U02Total number of non-Chinese minimum units, U, representing input trademarksc2non-Chinese minimum cell count, n, indicating that the preliminary search sample trademark does not match the input trademark2Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the non-Chinese minimum unit combination connection line2Represents the number n of points2The weight of (a); wherein, ω is2The value range of (a) is less than or equal to 80%;
obtaining the image feature descriptor minimum unit mismatching rate based on the following formula:
Mi0=(Uc0÷U00)×100%+(n0-1)×ω0
wherein M isi0Representing the minimum unit mismatch rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing input trademarksc0Minimum unit count of image feature descriptors, n, representing a preliminary search sample brand mismatch input brand0Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the image feature descriptor minimum unit combination connection line0Represents the number n of points0The weight of (a); wherein, ω is0The value range of (a) is less than or equal to 80%;
based on the following formula, obtaining the Chinese single-term approximation rate:
M1=Ma1-Mi1×β1
wherein M is1Representing the Chinese single term approximation rate, beta 1 representing Mi1The weight of (a); wherein the value range of beta 1 is less than or equal to 80 percent;
based on the following formula, the non-Chinese single-term approximation rate is obtained
M2=Ma2-Mi2×β2
Wherein M is2Representing non-Chinese single term approximation rate, beta 2 representing Mi2The weight of (a); whereinThe value range of beta 2 is less than or equal to 80 percent;
acquiring the image characteristic single-term approximation rate based on the following formula:
M0=Ma0-Mi0×β0
wherein M is0Representing the rate of univocal approximation, β, of the image features0Represents Mi0The weight of (a); wherein, beta0Is less than or equal to 80%.
Further, the comprehensive approximation rate is obtained based on the following formula:
M=(M1+M2+M0)÷μ
wherein μ represents M1、M2、M0Number of terms other than 0.
In a specific embodiment, the non-Chinese minimum unit is an English minimum unit; the non-Chinese minimum unit matching rate is the English minimum unit matching rate; the non-Chinese minimum unit mismatching rate is the English minimum unit mismatching rate; the non-Chinese single-term approximation rate is English single-term approximation rate;
the minimum unit combination connecting line of the image feature descriptor is an image feature line; the Chinese minimum unit combination connecting line is a minimum unit formed by the shape, sound and meaning characteristics corresponding to the Chinese trademark characters and a track line formed according to the arrangement sequence; the non-Chinese minimum unit combination connecting line is a minimum unit formed by the shape-pronunciation-meaning characteristics corresponding to the non-Chinese trademark characters and a track line formed according to the arrangement sequence.
For example: assuming that the readable text content in the trademark is 'blue global village', the minimum unit of the shape characteristic information is each readable text in the trademark, and the sequential line of the reading trajectory from left to right 'blue-color-ground-ball-village', or from right to left 'village-ball-ground-color-blue' is the minimum unit combination connecting line.
The sample trademark image obtained by the searching and matching is a matching result generated by taking the combination unit data of the input trademark and any local information of the trademark image as a searching keyword, embodies the commonality of the combination unit of the minimum unit of Chinese, English and image feature descriptors, and is the comprehensive reflection of the trademark image feature.
In the embodiments of the present invention, the comprehensive approximation rate between the preliminary search sample trademark and the input trademark is calculated, which can also be realized by referring to the prior art, for example, by using the method discussed in the patent application No. 201710553009.9 entitled "method and apparatus for evaluating and ranking approximation degree of trademark query result".
And step S260, sequencing the primary retrieval sample trademarks of which the comprehensive approximation rates meet the preset requirements to obtain retrieval results.
In a specific embodiment, the primary search sample trademarks with the comprehensive approximation rate of more than or equal to 30% are screened out, the screened primary search sample trademarks are ranked, and the primary search sample trademarks with the ranking rank of less than or equal to 500 are taken as the search results.
In practical application, the minimum unit matching rate, the mismatching rate, the comprehensive approximation rate and the preset ranking rank can be preset according to application requirements, generally, the preset minimum unit matching rate takes a value greater than 30%, the preset minimum unit mismatching rate takes a value less than 70%, the preset comprehensive approximation rate takes a value greater than 30%, and the preset ranking rank takes a value less than 500.
The preset ordering is ordering with the integrated approximation rate obtained by the matched sample trademark. The matched sample trademark meeting the preset sorting ranking is regarded as a trademark the same as or highly similar to the input trademark;
the same search sample trademark obtained in the previous step may be matched with a plurality of search keywords, that is, a plurality of search results may have records of repeated trademark registration numbers, and these repeated information are meaningless in the trademark search work and should be subjected to de-duplication processing. The specific method of the de-duplication processing is to sort the trademark records of the same registration number of the same commodity category according to the calculated comprehensive approximation rate, only take 1 record with the highest comprehensive approximation rate, and delete the trademark records of the same registration number of the same commodity category.
After the calculation of the steps, the matched sample trademark meeting the preset ranking rank can be used as a retrieval sample trademark, and the ranking result of the retrieval sample trademark can be reported as a trademark retrieval result.
The trademark identification and retrieval method can perform text identification and estimation and acquisition of the phonological characteristic information of an input trademark from the aspect of phonological definition by using massive trademark and knowledge data information existing in a system, can perform processing of converting a picture file of the input trademark or sample trademark represented in a picture form and characters of the input trademark or sample trademark recorded in a character form into image characteristic descriptors and associated text information respectively, perform division and combination processing on the processing result to obtain combination unit data of the input trademark or sample trademark and any local information of the trademark image, estimate and identify the pronunciation of the image, the meaning of character combination, trademark graphic element codes and other reflection image phonological characteristics which are not directly displayed in the image of the input trademark or sample trademark by using the associated information of big data, and take the acquired combination unit data and any local information of the trademark image as retrieval keywords, the defects that in the past, manual input is not easy to exhaust local combinations formed by shape-similar characters, sound-similar characters, meaning-similar characters and graphs of characters in an input trademark identification image, search keywords are not uniform easily, and extracted information is easy to miss are overcome; the method can effectively identify the standard text image and the non-standard text image, can overcome the defect that the traditional technical method easily causes the omission of the trademark retrieval key words, can effectively solve the problems of automation intelligence and accuracy comprehensiveness obtained by the retrieval key words in the trademark retrieval, realizes the spanning from the prior manual entry to the intelligent automatic identification entry, improves the identification accuracy, improves the matching effect of the same or similar trademarks in the trademark identification retrieval, improves the recall ratio and the precision ratio of the same or similar trademarks, and can effectively improve the working efficiency of the trademark retrieval.
To further illustrate the technical solution of the present invention, especially taking the practical application of the trademark identification and retrieval method of the present invention as an example, in an embodiment, as shown in fig. 3, a trademark identification and retrieval method is provided, which is described by taking the method applied to the terminal in fig. 1 as an example, and includes the following steps:
step S310, establishing a sample image database, and performing feature extraction, segmentation and combination processing on the sample image to obtain combination unit data of the sample image and store the combination unit data in the sample image database;
in a specific embodiment, the step of establishing the sample image database may specifically include:
collecting each sample image, and extracting and storing an image feature descriptor of each sample image;
inputting associated text information of the sample image;
dividing the image feature descriptors and carrying out combination processing according to the minimum unit combination rule of the image feature descriptors to obtain the minimum unit of each image feature descriptor and the combination unit data of each image feature descriptor;
splitting characters in the associated text information of the sample image one by one to obtain a minimum unit of the associated text information; combining the minimum units of the associated text information according to the minimum unit combination rule of the associated text information to obtain data of each character combination unit; the associated text information minimum unit combination rule comprises the following steps: determining characters which are same in size, color and language and are tightly connected as a connected combined character unit; determining the connected combined character units with fixed preset character numbers as local combined units; wherein the fixed length of the preset number of characters is a value which is more than 20% of the total number of characters of the connected combined character units;
acquiring character pronunciation matched with the character combination unit data from a character dictionary database, and labeling the character pronunciation in each character combination unit data according to the character pronunciation to obtain character pronunciation combination unit data;
acquiring word combinations matched with the word combination unit data from a word dictionary database to obtain word meaning combination unit data;
and coding each trademark graphic element of the sample trademark mark to be confirmed as trademark graphic element coding combination unit data.
In a specific embodiment, the sample image is subjected to segmentation and combination processing by referring to the feature extraction processing procedure described in step S320 below, and the sample image is taken as a processing object, so as to obtain minimum units and combination unit data of the sample image.
And step S320, converting, dividing and combining the input trademark to obtain the image feature descriptor, the associated text information and the combined unit data of the input trademark.
Specifically, as shown in fig. 4 and 5, several input trademarks are randomly given, the first exemplary image is a trademark pattern of hua technology limited, and the second exemplary image is a graphic trademark of "great wall" composed of clerical characters, and these patterns can be used as input trademarks of the embodiment of the present invention.
In the embodiment of the present invention, a specific implementation process of performing image feature descriptor extraction, segmentation, and combination processing on a sample image (or an input trademark) is further described with reference to fig. 4 and 5:
the method for converting the input trademark comprises the following steps:
the method for converting the picture file of the input trademark represented in the picture form into the image feature descriptor and the associated text information comprises the following steps:
firstly, extracting an image feature descriptor of a sample image represented in a picture form or a picture file of an input trademark by adopting a method in the prior art;
taking fig. 4 as an example, the image feature descriptor or the image contour line descriptor extracted from the invention patent "method and apparatus for acquiring image contour line descriptor" with application number 201710553007X may be used, where the image feature descriptor of the contour line based on the standard coordinate system of the 10 × 10 specification is:
3,4,5,15,25,35,45,55,65,55,45,44,34,24,23,13;
6,7,8,18,28,27,37,47,56,66,56,46,36,26,16;
12,23,33,34,44,54,55,65,64,54,53,43,42,32,31,21,22;
19,29,30,40,50,49,48,58,57,67,66,56,57,47,37,38,28,29;
41,42,52,53,54,64,65,64,63,62,61,51;
49,50,60,70,69,68,67,57,58,59;
62,63,64,65,74,73,83,82,72;
67,68,69,70,80,79,89,88,78,77;
81,82,92,91;
82,83,93,94,84,94,93,92;
84,85,95,96,95,94;
85,95;
86,96,97,87,97,98,88,98,97,87,97,96;
88,89,90,89,90,100,99,100,99,98;
90,100。
the image feature descriptor of the contour line based on a standard coordinate system of a 20 × 20 specification is:
7,8,9,30,50,70,90,110,130,150,170,190,210,230,250,230,229,209,189,188,168,148,147,127,107,106,86,66,46,26,27;
12,13,14,34,35,55,75,95,115,114,134,154,174,173,193,212,232,231,251,231,211,191,171,151,131,111,91,71,51,52,32;
44,64,85,105,106,126,127,147,167,168,188,208,209,229,249,248,228,227,206,205,185,184,164,163,143,142,122,102,82,83,63;
58,78,98,99,119,139,159,179,178,198,197,196,216,215,235,234,233,253,252,232,233,213,193,194,174,154,155,135,115,116,96,97,77;
161,162,182,183,184,204,205,225,226,227,247,248,269,268,267,266,265,264,263,243,242,222,221,201,181;
179,180,200,220,240,260,259,258,278,277,276,275,274,273,253,254,234,235,236,216,217,197,198,199;
263,264,265,266,267,268,269,288,287,307,306,325,324,304,303,283;
273,274,275,276,277,278,279,299,298,318,317,337,336,315,314,294,293;
321,341,342,343,323,324,344,364,384,383,363,362,361,381,361,341;
324,325,345,365,385,386,367,347,327,347,367,387,386,385,384,364,344;
329,330,350,370,371,391,390,370,369,388,368,348,349;349,350,370,369;
331,332,352,372,373,353,333,334,354,374,375,355,335,336,356,376,375,395,394,374,354,353,373,393,392,372,371,351;
337,338,339,359,358,357,358,359,379,378,377,398,399,398,397,377,376,356,357;
340,360,380,400,380,360。
fig. 6 is an image of the correspondence relationship between the position data of the pixel points on the contour line of the image in fig. 4 and the coordinate region of the standard coordinate system of the 10 × 10 standard.
Fig. 7 is an image of the correspondence relationship between the position data of the pixel points on the contour line of the image in fig. 4 and the coordinate region of the standard coordinate system of the 20 × 20 standard.
Taking fig. 5 as an example, the image feature descriptor or the image contour line descriptor extracted from the invention patent "method and apparatus for acquiring image contour line descriptor" with application number 201710553007X may be used, where the image feature descriptor of the contour line based on the standard coordinate system of the 10 × 10 specification is:
6,7,17,27,37,27,28,18,8,9,19,29,30,40,39,49,39,40,50,60,59,69,70,80,90,100,99,89,79,89,88,98,88,78,88,87,97,96,86,87,77,67,77,76,75,65,66,56,46,36,26,16;
38,48;
47,57;
58,68;
58,59,69,79,78,68;
2,12,22,23,13,14,4,14,24,23,33,32,42,43,44,34,35,45,55,54,53,63,64,74,75,85,95,94,84,74,73,83,93,92,82,72,62,52,51,41,31,41,42,32,22,12;
52,53,52,53,63,73,72,62;
9,10,20,19,29,19。
the image feature descriptor of the contour line based on a standard coordinate system of a 20 × 20 specification is:
16,17,37,57,77,97,98,118,119,120,140,160,159,158,157,177,197,198,178,158,159,179,199,219,239,238,258,278,279,299,319,320,340,360,380,400,399,398,378,358,338,337,317,337,357,356,376,356,355,335,315,316,315,295,315,335,334,354,374,373,372,352,332,333,313,293,294,274,273,293,292,312,311,291,290,270,251,252,232,212,192,191,171,151,131,132,112,92,72,52,32,33,53,73,93,113,133,134,114,94,95,115,116,96,76,56,36;
155,156,176,175;
173,174,194,214,234,233,213,193;
215,216,236,256,276,275,255,235;
216,217,237,257,277,297,296,276,256,236;
3,4,24,44,64,84,85,65,66,46,47,27,28,48,68,88,87,107,106,126,125,124,144,164,165,166,167,168,148,149,169,189,209,208,207,206,205,225,226,246,247,267,268,288,289,309,310,330,350,370,390,389,388,368,367,347,327,307,306,326,345,365,364,384,383,363,343,323,303,283,263,243,223,203,202,222,221,201,181,161,141,142,162,163,143,123,103,83,63,43,23;
204,205,204,224,225,245,265,266,286,285,305,304,284,264,244,224;
18,19,39,59,79,78,98,97,77,78,58,38。
fig. 8 is an image of correspondence between position data of pixels on the contour line of the image in fig. 5 and a coordinate region of a standard coordinate system of 10 × 10 standard.
Fig. 9 is an image of correspondence between position data of pixels on the contour line of the image of fig. 5 and a coordinate region of a standard coordinate system of a 20 × 20 standard.
Secondly, searching a sample image database based on an image feature descriptor by using mass data information of an existing recorded sample trademark to obtain a matched image feature descriptor, a sample image corresponding to the image feature descriptor and associated text information, and using the information as the image feature descriptor and the associated text information of the input trademark, wherein the associated text information comprises: the matched sample image records the trademark graphic element code in the graphic trademark, the name of the object described in the graphic trademark, the writing form, pronunciation and meaning of the trademark characters, and the characters, such as the shape characters, the pronunciation characters, the meaning characters and the like, and the shape-pronunciation characteristics.
The method for processing the words of the input trademark which are input in the form of words into the image characteristic descriptors and the associated text information comprises the following steps:
firstly, searching a sample image database based on characters of an input trademark input in a character form as key words to obtain matched sample characters; taking fig. 5 as an example, the entered text of the input trademark is "great wall", and the sample image database is searched using "great wall" as a keyword, so that a record of the matched sample text "great wall" can be obtained.
Secondly, finding out a sample image and associated text information corresponding to the matched sample characters, wherein the associated text information comprises: the matching sample image records trademark graphic element codes in a graphic trademark, object names described in the graphic trademark, image feature descriptors for representing the image, writing forms, pronunciation, meanings of trademark characters, and texts and phono-configurational features such as characters, phonetic characters, and phono-configurational features. The characters comprise Chinese characters, non-Chinese characters, numbers and symbols.
In the above example, the sample trademark and the associated text information corresponding to the word "great wall" are obtained through searching, and fig. 5 may be one of the corresponding sample trademarks, where the associated text information includes: the image feature descriptor of the image formed by various writing forms of the character of the great wall, the pronunciation and meaning of the character of the great wall, the characters with the shape and the pronunciation, the characters with the shape and the meaning, and the like.
Secondly, respectively segmenting the image feature descriptors and the associated text information;
the image feature descriptor is segmented, namely, the minimum unit of the image feature descriptor is identified, each minimum unit of the image feature descriptor is segmented, and the associated text information is segmented, namely, the minimum unit of the associated text information is identified, and each minimum unit of the associated text information is segmented.
As described above, the image feature descriptor of the image is used to represent the feature point of the image, which is the corresponding relationship between the position data of a certain pixel in the image contour and the coordinate region of the standard coordinate system of a certain specification, and therefore, the position data of one or more pixels in the image contour corresponding to one coordinate region of the standard coordinate system of each specification can be regarded as the minimum unit of the image feature descriptor.
As an example in fig. 7, the minimum unit of the image feature descriptor of "3, 4, 5, 15, 25, 35, 45, 55, 65, 55, 45, 44, 34, 24, 23, 13" in the image feature descriptor of the standard coordinate system with the outline based on the 10 × 10 specification is each number in the descriptor, that is: "3", "4", "5", "15", "25", "35", "45", "55", "65", "55", "45", "44", "34", "24", "23", "13".
As another example in fig. 7, the minimum unit of the image feature descriptor of "7, 8, 9, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230, 250, 230, 229, 209, 189, 188, 168, 148, 147, 127, 107, 106, 86, 66, 46, 26, 27" in the image feature descriptor of the contour line based on the standard coordinate system of the 20 × 20 specification is each number in the descriptor, that is: "7", "8", "9", "30", "50", "70", "90", "110", "130", "150", "170", "190", "210", "230", "250", "230", "229", "209", "189", "188", "168", "148", "147", "127", "107", "106", "86", "66", "46", "26", "27".
Taking the input of "great wall" characters as an example of the input trademark, the associated text information is "great wall" in terms of shape, each of which is the minimum unit of the input trademark, that is, "long" and "city" are the minimum units of the input trademark, respectively.
Thirdly, performing combined processing on the minimum unit data;
and combining the minimum unit data according to a preset combination rule to obtain combined unit data and any local information of the trademark image, wherein the combined unit data are a plurality of character strings corresponding to any local characteristic represented by the image characteristic descriptor or the associated text information.
The specific method for acquiring the combination unit data and any local information of the trademark image comprises the following steps:
firstly, according to the application requirement, establishing a preset image feature descriptor minimum unit combination rule, wherein the preset image feature descriptor minimum unit combination rule specifically comprises:
the image feature descriptor minimum unit combination rule of the image contour line comprises the following steps: 1) all line segments on each image contour line are regarded as an image integral combination unit; 2) the closed loop line on each image contour line is regarded as a connected domain combination unit; 3) the line segment on the image contour line with each first preset fixed length is regarded as a line segment combination unit, wherein the first preset fixed length can be taken as a value of 20% or more of the total length of the line segment.
The image feature descriptor minimum unit combination rule of the image skeleton line comprises the following steps: 1) all line segments on each image skeleton line are regarded as an image integral combination unit; 2) the uninterrupted connection line on each image skeleton line is regarded as a connected domain combination unit; 3) and the line segment on each image skeleton line with the second preset fixed length is regarded as a line segment combination unit, wherein the second preset fixed length can be taken as a value of 20% or more of the total length of the line segment.
Secondly, combining the minimum units of the image feature descriptors according to the preset minimum unit combination rule of the image feature descriptors to respectively obtain combination unit data of the image feature descriptors and any local information of the trademark image.
In some embodiments of the present invention, the obtained image feature descriptor combination unit data may be used to represent a connected component combination unit data, and may also represent a line segment combination unit data. Wherein, each connected domain combination unit data is any local information of the image represented by the image feature descriptor.
As an example of the image feature descriptor whose outline is based on the standard coordinate system of the 10 × 10 specification in fig. 5, any local information of each connected component or image includes the following:
“3,4,5,15,25,35,45,55,65,55,45,44,34,24,23,13”,
“6,7,8,18,28,27,37,47,56,66,56,46,36,26,16”,
“12,23,33,34,44,54,55,65,64,54,53,43,42,32,31,21,22”,
“19,29,30,40,50,49,48,58,57,67,66,56,57,47,37,38,28,29”,
“41,42,52,53,54,64,65,64,63,62,61,51”,
“49,50,60,70,69,68,67,57,58,59”,
“62,63,64,65,74,73,83,82,72”,
“67,68,69,70,80,79,89,88,78,77”,
“81,82,92,91”,
“82,83,93,94,84,94,93,92”,
“84,85,95,96,95,94”,
“85,95”,
“86,96,97,87,97,98,88,98,97,87,97,96”,
“88,89,90,89,90,100,99,100,99,98”,
“90,100”。
specific methods for identifying, segmenting and combining the minimum units of the associated text information of different information objects are respectively described as follows:
1. the readable text content is the text contained in the trademark.
The letters contained in the trademark include: chinese characters, domestic minority nationality characters, foreign characters, domestic minority nationality characters and foreign characters can be subdivided into a plurality of different language characters according to different languages. The character contained in the trademark is divided one by one to ensure that each character becomes the minimum unit of the character of the trademark, and the number of the characters is the minimum unit number; the minimum unit combining processing of the characters contained in the trademark is to combine the characters according to the following combination rules, and respectively obtain the minimum unit combining unit data of the characters (namely the character combining unit data):
the trademark character minimum unit combination rule comprises the following steps: 1) each character with the same size, color and language and closely connected is regarded as a connected combined character unit; 2) and each local combined unit is provided with a fixed length of preset characters on the connected combined character unit, wherein the fixed length of the preset characters can be a value of more than 20% of the total characters of the connected combined character units.
2. Character pronunciation which can be read;
for the obtained minimum unit combination unit data of characters (namely, the data of character combination units), matched character pronunciation can be obtained from the character dictionary database, the character pronunciation is marked, and the data of minimum unit combination unit of character pronunciation (namely, the data of character pronunciation combination units) is obtained.
3. The meaning of the character combination can be read;
for the above-mentioned minimum combination unit data of characters and the combination of characters of trademark, a word combination matching the minimum combination unit data of characters and the combination of characters of trademark as well as a word combination regarded as a character combination having a meaning of trademark and a character combination regarded as a character combination having no meaning of trademark and not able to be word-combined for all characters of trademark can be regarded as one combination unit data of minimum combination unit of characters and meaning (i.e. combination unit data of characters and meaning).
4. Coding the trademark graphic elements;
when there are a plurality of trademark graphic elements encoded per one trademark mark, each trademark graphic element is encoded as trademark graphic element encoded minimum unit combination unit data (i.e., trademark graphic element encoded combination unit data).
In a specific embodiment, a sample trademark database is established, and combination unit data and trademark image arbitrary local information of the sample trademark obtained by converting, dividing and combining the sample trademark data are stored in the sample trademark database through the steps.
And step S330, searching a sample trademark database based on the combined unit data to obtain a matched preliminary search sample trademark, an image related to the sample trademark, recorded text and phonological feature information in the trademark, and matched minimum unit and combined unit data.
Step S340, calculating the single-item approximation rate and the comprehensive approximation rate of the shape, the sound, the meaning and the retrieval key words between the initial retrieval sample trademark and the input trademark.
And step S350, sorting the searched sample trademarks according to the comprehensive approximation rate which accords with the preset single-item approximation rate and the comprehensive approximation rate and/or the sorting ranking name which accords with the preset ranking name, and reporting the searching result.
In a specific embodiment, the specific implementation of steps S330 to S350 may refer to the processing procedures described in steps S240 to S260 for inputting the trademark, perform search matching on the input trademark with the input trademark as the processing object, and obtain the final search result.
The invention provides a trademark identification and retrieval method, which can perform text identification and estimation and acquisition of the phonological characteristic information of an input trademark image from the aspect of phonological definition by using massive trademark and knowledge data information existing in a system, can perform processing of converting a picture file of the input trademark or sample trademark represented in a picture form and characters of the input trademark or sample trademark recorded in a character form into an image characteristic descriptor and associated text information respectively, perform division and combination processing on the processing result to obtain combination unit data of the input trademark or sample trademark and any local information of the trademark image, estimate and identify the pronunciation of the image which is not directly displayed in the image of the input trademark or sample trademark, the meaning of character combination, trademark graphic element codes and other image phonological characteristics by using the associated information of big data, and take the acquired combination unit data and any local information of the trademark image as retrieval keywords, the defects that in the past, manual input is not easy to exhaust local combinations formed by shape-similar characters, sound-similar characters, meaning-similar characters and graphs of characters in an input trademark identification image, search keywords are not uniform easily, and extracted information is easy to miss are overcome; the method can effectively identify the standard text image and the non-standard text image, can overcome the defect that the traditional technical method easily causes the omission of the trademark retrieval key words, can effectively solve the problems of automation intelligence and accuracy comprehensiveness obtained by the retrieval key words in the trademark retrieval, realizes the spanning from the prior manual entry to the intelligent automatic identification entry, improves the identification accuracy, improves the matching effect of the same or similar trademarks in the trademark identification retrieval, improves the recall ratio and the precision ratio of the same or similar trademarks, and can effectively improve the working efficiency of the trademark retrieval.
It should be understood that, although the steps in the flowcharts of fig. 2 and 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 3 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 10, there is provided a trademark identification retrieval apparatus including:
the conversion module 110 is configured to convert image data of the input trademark through a retrieval sample image database to obtain an image feature descriptor and associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
a segmentation module 120, configured to segment the image feature descriptors and the associated text information of the input trademark, respectively, to obtain minimum units of the image feature descriptors and minimum units of the associated text information of the input trademark; the minimum unit of the image feature descriptor is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one character or a plurality of character combinations with meanings corresponding to any text information characteristic point represented by the associated text information;
the combination module 130 is configured to respectively combine each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
a retrieval module 140, configured to retrieve a sample trademark database in the sample image database based on the image feature descriptor combining unit data and the associated text information combining unit data to obtain each preliminary retrieval sample trademark and each image feature descriptor minimum unit and each associated text information minimum unit of the preliminary retrieval sample trademark that are matched;
the acquisition approximation rate module 150 is configured to acquire a single approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary search sample trademark, and the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain the comprehensive approximation rate of the initial search sample trademark and the input trademark;
and the sorting module 160 is configured to sort the preliminary search sample trademarks with the comprehensive approximation rate meeting the preset requirement, so as to obtain a search result.
In a specific embodiment, the input trademark comprises an input trademark recorded in a picture form and an input trademark recorded in a text form; the sample image comprises a sample image recorded in a picture form and a sample image recorded in a text form;
the sample image comprises a trademark pattern, an appearance design pattern, artwork patterns registered by copyright, Chinese character patterns, non-Chinese character patterns and a custom image; the sample image database also comprises a trademark constituent element sample image database, a character dictionary database and a word dictionary database;
the associated text information of the sample image comprises a trademark graphic element code of the recorded sample image, an object name described by the sample image and text and phonological and ideographic characteristics of characters which can be read by the sample image; the shape, pronunciation and meaning characteristics comprise a graphic shape representation form of a sample image or a writing form, pronunciation and meaning of characters, and a shape, a pronunciation and a meaning near character;
the image feature descriptor is an image feature representation form which records the same perception content or feature in the input trademark or sample image by adopting the same or highly similar character strings, and records the different perception content or feature in the input trademark or sample image by adopting different character strings; the image feature representation is a set of one or more sets of character strings describing image features of the input trademark or sample image;
the preset minimum unit combination rule comprises an image feature descriptor minimum unit combination rule and an associated text information minimum unit combination rule; the image feature descriptor combination unit data includes data representing a connected component combination unit, data representing a line segment combination unit, and data representing a character string for storage; the associated text information combination unit data comprises character combination unit data, character pronunciation combination unit data, character meaning combination unit data and trademark graphic element coding combination unit data;
the system also comprises a database establishing module used for establishing a sample image database.
In a specific embodiment, the words in the associated text message include Chinese words, foreign words of various languages, numbers and symbols;
establishing a database module for collecting each sample image, and extracting and storing an image feature descriptor of each sample image; inputting associated text information of the sample image; dividing the image feature descriptors and carrying out combination processing according to the minimum unit combination rule of the image feature descriptors to obtain the minimum unit of each image feature descriptor and the combination unit data of each image feature descriptor; splitting characters in the associated text information of the sample image one by one to obtain a minimum unit of the associated text information; combining the minimum units of the associated text information according to the minimum unit combination rule of the associated text information to obtain data of each character combination unit; the associated text information minimum unit combination rule comprises the following steps: determining characters which are same in size, color and language and are tightly connected as a connected combined character unit; determining the connected combined character units with fixed preset character numbers as local combined units; wherein the fixed length of the preset number of characters is a value which is more than 20% of the total number of characters of the connected combined character units; acquiring character pronunciation matched with the character combination unit data from a character dictionary database, and labeling the character pronunciation in each character combination unit data according to the character pronunciation to obtain character pronunciation combination unit data; acquiring word combinations matched with the word combination unit data from a word dictionary database to obtain word meaning combination unit data; and coding each trademark graphic element of the sample trademark mark to be confirmed as trademark graphic element coding combination unit data.
In a specific embodiment, the conversion module is used for extracting an image feature descriptor of an input trademark recorded in a picture form; searching a sample image database based on the image feature descriptors, regarding a sample image corresponding to the matched image feature descriptors as an image which is the same as or highly similar to the image of the input trademark, and determining the image feature descriptors and the associated text information recorded in the sample image as the image feature descriptors and the associated text information of the input trademark which are input in a picture form; and the number of the first and second groups,
and searching a sample image database based on the characters of the input trademark recorded in the character form, and determining the image feature descriptors and the associated text information recorded in the sample image corresponding to the matched sample characters as the image feature descriptors and the associated text information of the input trademark recorded in the character form.
In a specific embodiment, the image feature descriptor minimum unit combination rule may include an image feature descriptor minimum unit combination rule of an image contour line and an image feature descriptor minimum unit combination rule of an image skeleton line;
the image feature descriptor minimum unit combination rule of the image contour line comprises the following steps: all line segments on any image contour line are determined as an image integral combination unit; confirming a closed loop line on any image contour line as a connected domain combination unit; confirming the line segment on any first preset fixed-length image contour line as a line segment combination unit; wherein, the value range of the first preset fixed length is more than or equal to 20% of the total length of the line segments on the image contour line;
the image feature descriptor minimum unit combination rule of the image skeleton line comprises the following steps: all line segments on any image skeleton line are determined as an image integral combination unit; determining uninterrupted connection lines on any image skeleton line as a connected domain combination unit; confirming the line segment on any second preset fixed-length image skeleton line as a line segment combination unit; and the value range of the second preset fixed length is more than or equal to 20% of the total length of the line segments on the image skeleton line.
In a specific embodiment, the image feature descriptor is a feature descriptor used for representing the corresponding relation between position data of any pixel point of an image contour line or an image skeleton line and a coordinate region of a standard coordinate system of any specification;
the minimum unit of the image feature descriptor is position data of one or more pixel points of an image contour line or an image skeleton line corresponding to any coordinate region of a standard coordinate system with any specification;
the minimum unit of the associated text information is data of words or vocabularies which are corresponding to the associated text information represented by any word or word combination and have meanings.
In a specific embodiment, the associated text information minimum unit comprises a Chinese minimum unit and a non-Chinese minimum unit; the single term approximation rate comprises a Chinese single term approximation rate, a non-Chinese single term approximation rate and an image characteristic single term approximation rate;
an obtain approximation rate module to:
acquiring the total number of Chinese minimum units, the total number of non-Chinese minimum units and the total number of image feature descriptor minimum units of an input trademark, preliminarily searching the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of a sample trademark matched with the input trademark, and preliminarily searching the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of the sample trademark not matched with the input trademark;
the Chinese minimum unit matching rate is obtained based on the following formula:
Ma1=(Ua1÷U01)×100%
wherein M isa1Representing the minimum unit match rate, U, of Chinese01Total number of Chinese minimum units, U, representing input trademarksa1Representing the Chinese minimum unit total number of the initial search sample trademark matched with the input trademark;
obtaining a non-Chinese minimum unit matching rate based on the following formula:
Ma2=(Ua2÷U02)×100%
wherein M isa2Representing non-Chinese minimum unit match rate, U02Total number of non-Chinese minimum units, U, representing input trademarksa2Representing the non-Chinese minimum unit total number of the initial search sample trademark matched with the input trademark;
obtaining the minimum unit matching rate of the image feature descriptors based on the following formula:
Ma0=(Ua0÷U00)×100%
wherein M isa0Representing the minimum unit match rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing input trademarksa0The minimum unit total number of the image feature descriptors representing the preliminary retrieval sample trademark matched with the input trademark;
the Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi1=(Uc1÷U01)×100%+(n1-1)×ω1
wherein M isi1Representing the minimum unit mismatch rate, U, in Chinese01Total number of Chinese minimum units, U, representing input trademarksc1Chinese minimum Unit count, n, representing Primary search sample Brand mismatch input Brand1Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the Chinese minimum unit combination connection line1Represents the number n of points1The weight of (a); wherein, ω is1The value range of (a) is less than or equal to 80%;
the non-Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi2=(Uc2÷U02)×100%+(n2-1)×ω2
wherein M isi2Representing the non-Chinese minimum unit mismatch rate, U02Total number of non-Chinese minimum units, U, representing input trademarksc2non-Chinese minimum cell count, n, indicating that the preliminary search sample trademark does not match the input trademark2Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the non-Chinese minimum unit combination connection line2Represents the number n of points2The weight of (a); wherein, ω is2The value range of (a) is less than or equal to 80%;
obtaining the image feature descriptor minimum unit mismatching rate based on the following formula:
Mi0=(Uc0÷U00)×100%+(n0-1)×ω0
wherein M isi0Representing the minimum unit mismatch rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing input trademarksc0Minimum unit count of image feature descriptors, n, representing a preliminary search sample brand mismatch input brand0Indicates the number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the image feature descriptor minimum unit combination connection line0Represents the number n of points0The weight of (a); wherein, ω is0The value range of (a) is less than or equal to 80%;
based on the following formula, obtaining the Chinese single-term approximation rate:
M1=Ma1-Mi1×β1
wherein M is1Representing the Chinese single term approximation rate, beta 1 representing Mi1The weight of (a); wherein the value range of beta 1 is less than or equal to 80 percent;
based on the following formula, the non-Chinese single-term approximation rate is obtained
M2=Ma2-Mi2×β2
Wherein M is2Representing non-Chinese single term approximation rate, beta 2 representing Mi2The weight of (a); wherein the value range of beta 2 is less than or equal to 80 percent;
acquiring the image characteristic single-term approximation rate based on the following formula:
M0=Ma0-Mi0×β0
wherein M is0Representing the rate of univocal approximation, β, of the image features0Represents Mi0The weight of (a); wherein, beta0Is less than or equal to 80%.
In a specific embodiment, the obtain approximate rates module is further configured to:
obtaining a comprehensive approximation rate based on the following formula:
M=(M1+M2+M0)÷μ
wherein μ represents M1、M2、M0Number of terms other than 0.
In a specific embodiment, the non-Chinese minimum unit is an English minimum unit; the non-Chinese minimum unit matching rate is the English minimum unit matching rate; the non-Chinese minimum unit mismatching rate is the English minimum unit mismatching rate; the non-Chinese single-term approximation rate is English single-term approximation rate;
combining and connecting the minimum units of the image feature descriptors into image feature lines; the Chinese minimum unit combination connecting line is a minimum unit formed by the shape, sound and meaning characteristics corresponding to the Chinese trademark characters and a track line formed according to the arrangement sequence; the non-Chinese minimum unit combination connecting line is a minimum unit formed by the shape-pronunciation-meaning characteristics corresponding to the non-Chinese trademark characters and a track line formed according to the arrangement sequence.
In a specific embodiment, the sorting module is configured to screen out a preliminary search sample trademark with a comprehensive approximation rate greater than or equal to 30%, sort the screened preliminary search sample trademarks, and take the preliminary search sample trademark with a ranking rank less than or equal to 500 as the search result.
The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
For the specific definition of the trademark identification retrieval device, reference may be made to the above definition of the trademark identification retrieval method, which is not described herein again. The modules in the trademark identification retrieval device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 11. The computer device comprises a processor, a memory, a network interface, a database, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as trademark sample images and databases. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a method for recognizing image texts and phonological and ideographic features. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory in which a computer program is stored and a processor that implements the steps of the above-described trademark identification retrieval method when the processor executes the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps in the above-described trademark identification retrieval method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (13)

1. A trademark identification and retrieval method is characterized by comprising the following steps:
converting image data of an input trademark through a retrieval sample image database to obtain an image feature descriptor and associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
respectively segmenting the image feature descriptors and the associated text information of the input trademark to obtain minimum units of the image feature descriptors and the associated text information of the input trademark; the image feature descriptor minimum unit is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one word or a plurality of meaningful word combinations corresponding to any text information characteristic point represented by the associated text information;
respectively combining each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
searching a sample trademark database in the sample image database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each matched preliminary search sample trademark, each image feature descriptor minimum unit and each associated text information minimum unit of the preliminary search sample trademark;
obtaining a single-item approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark, and the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain a comprehensive approximation rate of the preliminary retrieval sample trademark and the input trademark;
and sequencing the preliminary retrieval sample trademarks of which the comprehensive approximation rate meets the preset requirement to obtain a retrieval result.
2. The trademark identification retrieval method according to claim 1,
the input trademark comprises an input trademark recorded in a picture form and an input trademark recorded in a character form; the sample image comprises a sample image recorded in a picture form and a sample image recorded in a text form;
the sample image comprises a trademark pattern, an appearance design pattern, artwork patterns registered by copyright, Chinese character patterns, non-Chinese character patterns and a custom image; the sample image database also comprises a trademark constituent element sample image database, a character dictionary database and a word dictionary database;
the associated text information of the sample image comprises the recorded trademark graphic element code of the sample image, the name of the thing described by the sample image and the text and the phonotactic characteristics of the characters which can be read by the sample image; the shape, pronunciation and meaning characteristics comprise a graphic shape representation form of the sample image or a writing form, pronunciation and meaning of characters, and a shape, a pronunciation and a meaning near character;
the image feature descriptor is in an image feature representation form of recording the input trademark or the sample image with the same perception content or feature by adopting the same or highly similar character strings and recording the input trademark or the sample image with different perception content or feature by adopting different character strings; the image feature representation is a set of one or more sets of character strings describing image features of the input trademark or the sample image;
the preset minimum unit combination rule comprises an image feature descriptor minimum unit combination rule and an associated text information minimum unit combination rule; the image feature descriptor combination unit data includes data representing a connected component combination unit, data representing a line segment combination unit, and data representing a character string for storage; the associated text information combination unit data comprises character combination unit data, character pronunciation combination unit data, character meaning combination unit data and trademark graphic element coding combination unit data;
the step of converting the image data of the input trademark by searching the sample image database to obtain the image feature descriptor and the associated text information of the input trademark further comprises the following steps:
and establishing the sample image database.
3. The trademark identification retrieval method according to claim 2, wherein the characters in the associated text information include Chinese characters, foreign characters of various languages, numbers and symbols;
the step of establishing the sample image database comprises:
collecting each sample image, and extracting and storing an image feature descriptor of each sample image;
inputting associated text information of the sample image;
dividing the image feature descriptors and carrying out combination processing according to the minimum unit combination rule of the image feature descriptors to obtain the minimum unit of each image feature descriptor and the combination unit data of each image feature descriptor;
splitting characters in the associated text information of the sample image one by one to obtain a minimum unit of the associated text information; combining the minimum units of the associated text information according to a minimum unit combination rule of the associated text information to obtain data of the word combination units; the associated text information minimum unit combination rule comprises the following steps: determining characters which are same in size, color and language and are tightly connected as a connected combined character unit; determining the connected combined character units with the fixed preset character number as local combined units; wherein the fixed length of the preset number of words is a value which is more than 20% of the total number of words of the connected combined word unit;
acquiring the character pronunciation matched with the character combination unit data from the character dictionary database, and labeling the character pronunciation in each character combination unit data according to the character pronunciation to obtain the character pronunciation combination unit data;
acquiring word combinations matched with the word combination unit data from the word dictionary database to obtain the word meaning combination unit data;
and coding each trademark graphic element of the sample trademark mark to be confirmed as the trademark graphic element code combination unit data.
4. The trademark identification retrieval method according to claim 2, wherein the step of obtaining the image feature descriptor and the associated text information of the input trademark by converting the image data of the input trademark through retrieving a sample image database comprises:
extracting the image feature descriptor of the input trademark recorded in the form of the picture; searching the sample image database based on the image feature descriptors, regarding a sample image corresponding to the matched image feature descriptors as an image which is the same as or highly similar to the image of the input trademark, and confirming the image feature descriptors and associated text information recorded in the sample image as the image feature descriptors and associated text information of the input trademark recorded in the form of pictures; and the number of the first and second groups,
and searching the sample image database based on the characters of the input trademark input in the character form, and confirming the image feature descriptors and the associated text information recorded in the sample image corresponding to the matched sample characters as the image feature descriptors and the associated text information of the input trademark input in the character form.
5. The trademark identification retrieval method according to claim 2, wherein the image feature descriptor minimum unit combination rule includes an image feature descriptor minimum unit combination rule of an image contour line and an image feature descriptor minimum unit combination rule of an image skeleton line;
the image feature descriptor minimum unit combination rule of the image contour line comprises the following steps: all line segments on any image contour line are determined as an image integral combination unit; confirming a closed loop line on any image contour line as a connected domain combination unit; confirming the line segment on any first preset fixed-length image contour line as a line segment combination unit; wherein, the value range of the first preset fixed length is more than or equal to 20% of the total length of the line segments on the image contour line;
the image feature descriptor minimum unit combination rule of the image skeleton line comprises the following steps: all line segments on any image skeleton line are determined as an image integral combination unit; determining uninterrupted connection lines on any image skeleton line as a connected domain combination unit; confirming the line segment on any second preset fixed-length image skeleton line as a line segment combination unit; and the value range of the second preset fixed length is more than or equal to 20% of the total length of the line segments on the image skeleton line.
6. The trademark identification retrieval method according to claim 2,
the image feature descriptor is used for representing the corresponding relation between position data of any pixel point of an image contour line or an image skeleton line and a standard coordinate system coordinate region of any specification;
the minimum unit of the image feature descriptor is position data of one or more pixel points of the image contour line or the image skeleton line corresponding to any coordinate region of the standard coordinate system with any specification;
the minimum unit of the associated text information is data of words or vocabularies which are corresponding to the associated text information represented by any word or word combination and have meanings.
7. The trademark identification retrieval method according to any one of claims 1 to 6, wherein the associated text information minimum unit includes a Chinese minimum unit and a non-Chinese minimum unit; the single term approximation rate comprises a Chinese single term approximation rate, a non-Chinese single term approximation rate and an image characteristic single term approximation rate;
the step of obtaining the single-item approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark and the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark comprises the following steps:
acquiring the total number of Chinese minimum units, the total number of non-Chinese minimum units and the total number of image feature descriptor minimum units of the input trademark, wherein the preliminary retrieval sample trademark matches the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of the input trademark, and the preliminary retrieval sample trademark does not match the Chinese minimum unit total number, the non-Chinese minimum unit total number and the image feature descriptor minimum unit total number of the input trademark;
the Chinese minimum unit matching rate is obtained based on the following formula:
Ma1=(Ua1÷U01)×100%
wherein M isa1Representing the minimum unit match rate, U, of Chinese01Total number of Chinese minimum units, U, representing the input trademarka1Representing the Chinese minimum unit total number of the preliminary retrieval sample trademark matched with the input trademark;
obtaining a non-Chinese minimum unit matching rate based on the following formula:
Ma2=(Ua2÷U02)×100%
wherein M isa2Representing non-Chinese minimum unit match rate, U02Total number of non-Chinese minimum units, U, representing the input trademarka2A non-Chinese minimum cell count representing that the preliminary search sample trademark matches the input trademark;
obtaining the minimum unit matching rate of the image feature descriptors based on the following formula:
Ma0=(Ua0÷U00)×100%
wherein M isa0Representing the minimum unit match rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing the input trademarka0The minimum unit total number of the image feature descriptors representing that the preliminary retrieval sample trademark matches the input trademark;
the Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi1=(Uc1÷U01)×100%+(n1-1)×ω1
wherein M isi1Representing the minimum unit mismatch rate, U, in Chinese01Total number of Chinese minimum units, U, representing the input trademarkc1A Chinese minimum unit count, n, indicating that the preliminary search sample trademark does not match the input trademark1The number of unmatched positions, omega, of the initial search sample trademark and the input trademark on the Chinese minimum unit combination connection line is represented1Represents the number n of the treatment1The weight of (a); wherein, ω is1The value range of (a) is less than or equal to 80%;
the non-Chinese minimum unit mismatching rate is obtained based on the following formula:
Mi2=(Uc2÷U02)×100%+(n2-1)×ω2
wherein M isi2Representing the non-Chinese minimum unit mismatch rate, U02Total number of non-Chinese minimum units, U, representing the input trademarkc2A non-Chinese minimum cell count, n, indicating that the preliminary search sample trademark does not match the input trademark2Showing the location of the preliminary search sample trademark and the input trademark on the non-Chinese minimum unit combination connection lineUnmatched counts, ω2Represents the number n of the treatment2The weight of (a); wherein, ω is2The value range of (a) is less than or equal to 80%;
obtaining the image feature descriptor minimum unit mismatching rate based on the following formula:
Mi0=(Uc0÷U00)×100%+(n0-1)×ω0
wherein M isi0Representing the minimum unit mismatch rate, U, of image feature descriptors00Total number of minimum units, U, of image feature descriptors representing the input trademarkc0Minimum unit count, n, of image feature descriptors indicating that the preliminary search sample trademark does not match the input trademark0A point, ω, indicating that the preliminary search sample trademark and the input trademark do not match on the image feature descriptor minimum unit combination line0Represents the number n of the treatment0The weight of (a); wherein, ω is0The value range of (a) is less than or equal to 80%;
acquiring the Chinese single-term approximation rate based on the following formula:
M1=Ma1-Mi1×β1
wherein M is1Representing the Chinese single term approximation rate, beta 1 representing Mi1The weight of (a); wherein the value range of beta 1 is less than or equal to 80 percent;
obtaining the non-Chinese single-term approximation rate based on the following formula
M2=Ma2-Mi2×β2
Wherein M is2Representing non-Chinese single term approximation rate, beta 2 representing Mi2The weight of (a); wherein the value range of beta 2 is less than or equal to 80 percent;
acquiring the image feature single-term approximation rate based on the following formula:
M0=Ma0-Mi0×β0
wherein M is0Representing the rate of univocal approximation, β, of the image features0Represents Mi0The weight of (a); wherein, beta0Is gotThe range of values is less than or equal to 80%.
8. The trademark identification retrieval method of claim 7, wherein the step of processing the single item approximation rate to obtain a comprehensive approximation rate of the preliminary retrieval sample trademark and the input trademark comprises:
obtaining the comprehensive approximation rate based on the following formula:
M=(M1+M2+M0)÷μ
wherein μ represents M1、M2、M0Number of terms other than 0.
9. The trademark identification retrieval method of claim 7, wherein the non-chinese minimum unit is an english minimum unit; the non-Chinese minimum unit matching rate is English minimum unit matching rate; the non-Chinese minimum unit mismatching rate is English minimum unit mismatching rate; the non-Chinese single-term approximation rate is English single-term approximation rate;
the image feature descriptor minimum unit combination connecting line is an image feature line; the Chinese minimum unit combined connecting line is a minimum unit formed by the shape, sound and meaning characteristics corresponding to the Chinese trademark characters and a track line formed according to the arrangement sequence; the non-Chinese minimum unit combination connecting line is a minimum unit formed by the shape-sound-meaning characteristics corresponding to the non-Chinese trademark characters and a track line formed according to the arrangement sequence.
10. The trademark identification and retrieval method according to claim 7, wherein the step of sorting the preliminary retrieval sample trademarks with the comprehensive approximation rate meeting preset requirements to obtain retrieval results comprises the following steps:
screening out the preliminary retrieval sample trademark with the comprehensive approximation rate of more than or equal to 30 percent, sorting the screened preliminary retrieval sample trademark, and taking the preliminary retrieval sample trademark with the sorting frequency of less than or equal to within 500 as the retrieval result.
11. A trademark identification retrieval apparatus, comprising:
the conversion module is used for converting the image data of the input trademark through a retrieval sample image database to obtain the image feature descriptor and the associated text information of the input trademark; the sample image database is a pre-established database containing image feature descriptors of sample images, associated text information, minimum unit data and combined unit data; the combined unit data is data representing any local information of the image;
the segmentation module is used for respectively segmenting the image feature descriptors and the associated text information of the input trademark to obtain minimum units of the image feature descriptors and the associated text information of the input trademark; the image feature descriptor minimum unit is one or more character strings corresponding to any image feature point represented by the image feature descriptor; the minimum unit of the associated text information is one word or a plurality of meaningful word combinations corresponding to any text information characteristic point represented by the associated text information;
the combination module is used for respectively combining each image feature descriptor minimum unit and each associated text information minimum unit of the input trademark according to a preset minimum unit combination rule to obtain image feature descriptor combination unit data and associated text information combination unit data of the input trademark;
a retrieval module, configured to retrieve a sample trademark database in the sample image database based on the image feature descriptor combination unit data and the associated text information combination unit data to obtain each preliminary retrieval sample trademark and each image feature descriptor minimum unit and each associated text information minimum unit of the preliminary retrieval sample trademark that are matched;
the acquisition approximation rate module is used for acquiring a single approximation rate according to the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the preliminary retrieval sample trademark, the minimum unit of each image feature descriptor and the minimum unit of each associated text information of the input trademark; processing the single-item approximation rate to obtain a comprehensive approximation rate of the preliminary retrieval sample trademark and the input trademark;
and the sequencing module is used for sequencing the preliminary retrieval sample trademark with the comprehensive approximation rate meeting the preset requirement to obtain a retrieval result.
12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.
CN201810481421.9A 2018-05-18 2018-05-18 Trademark identification retrieval method and device, computer equipment and storage medium Active CN108763380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810481421.9A CN108763380B (en) 2018-05-18 2018-05-18 Trademark identification retrieval method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810481421.9A CN108763380B (en) 2018-05-18 2018-05-18 Trademark identification retrieval method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108763380A CN108763380A (en) 2018-11-06
CN108763380B true CN108763380B (en) 2022-03-08

Family

ID=64008439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810481421.9A Active CN108763380B (en) 2018-05-18 2018-05-18 Trademark identification retrieval method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108763380B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558876B (en) * 2018-11-20 2021-11-16 浙江口碑网络技术有限公司 Character recognition processing method and device
CN110059159A (en) * 2019-04-15 2019-07-26 重庆天蓬网络有限公司 A kind of similar mark real-time monitoring system
CN111858994A (en) * 2019-04-26 2020-10-30 深圳市蓝灯鱼智能科技有限公司 Character retrieval method and device
CN110503682B (en) * 2019-08-08 2021-05-07 深圳市优讯通信息技术有限公司 Rectangular control identification method and device, terminal and storage medium
CN113553980A (en) * 2021-07-30 2021-10-26 徐庆 Method, system and device for generating trademark graphic element codes of pictures
CN113553463A (en) * 2021-07-30 2021-10-26 徐庆 Trademark identification query method, system, data storage and storage medium
CN115186087B (en) * 2022-07-01 2023-11-28 至本医疗科技(上海)有限公司 Method, apparatus and computer storage medium for retrieving information related to genes and tumors
CN115344738B (en) * 2022-10-18 2023-02-28 南通知果科技有限公司 Retrieval method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541954A (en) * 2010-12-29 2012-07-04 北京大学 Method and system for searching trademarks
CN102799653A (en) * 2012-06-29 2012-11-28 中国科学院自动化研究所 Logo detection method based on spatial connected domain prepositioning
CN103258037A (en) * 2013-05-16 2013-08-21 西安工业大学 Trademark identification searching method for multiple combined contents
CN104021229A (en) * 2014-06-25 2014-09-03 厦门大学 Shape representing and matching method for trademark image retrieval
CN106649851A (en) * 2016-12-30 2017-05-10 徐庆 Similar trademark query result ordering method, device and trademark server thereof
CN106897722A (en) * 2015-12-18 2017-06-27 南京财经大学 A kind of trademark image retrieval method based on region shape feature

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174132A1 (en) * 2001-05-04 2002-11-21 Allresearch, Inc. Method and system for detecting unauthorized trademark use on the internet
CN105426530B (en) * 2015-12-15 2017-05-10 徐庆 Trademark retrieving method, device and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541954A (en) * 2010-12-29 2012-07-04 北京大学 Method and system for searching trademarks
CN102799653A (en) * 2012-06-29 2012-11-28 中国科学院自动化研究所 Logo detection method based on spatial connected domain prepositioning
CN103258037A (en) * 2013-05-16 2013-08-21 西安工业大学 Trademark identification searching method for multiple combined contents
CN104021229A (en) * 2014-06-25 2014-09-03 厦门大学 Shape representing and matching method for trademark image retrieval
CN106897722A (en) * 2015-12-18 2017-06-27 南京财经大学 A kind of trademark image retrieval method based on region shape feature
CN106649851A (en) * 2016-12-30 2017-05-10 徐庆 Similar trademark query result ordering method, device and trademark server thereof
CN107330109A (en) * 2016-12-30 2017-11-07 徐庆 A kind of trade mark inquiry result degree of approximation is evaluated and sort method, device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Content based image retrieval system based on watershed transform for trademark images;Cahyo Crysdian;《Electrical Power, Electronics, Communicatons, Control and Informatics Seminar》;20150108;第116-120页 *
基于感知哈希算法的商标图像的检索;安坤;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140915;I138-797 *

Also Published As

Publication number Publication date
CN108763380A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108763380B (en) Trademark identification retrieval method and device, computer equipment and storage medium
WO2018120899A1 (en) Trademark inquiry result proximity evaluating and sorting method and device
US8792732B1 (en) Automatic large scale video object recognition
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
Wei et al. A keyword retrieval system for historical Mongolian document images
CN109299233B (en) Text data processing method, device, computer equipment and storage medium
CN108363691B (en) Domain term recognition system and method for power 95598 work order
CN108734159B (en) Method and system for detecting sensitive information in image
CN107526721B (en) Ambiguity elimination method and device for comment vocabularies of e-commerce products
CN112347284A (en) Combined trademark image retrieval method
Li et al. Publication date estimation for printed historical documents using convolutional neural networks
CN114298035A (en) Text recognition desensitization method and system thereof
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN115630843A (en) Contract clause automatic checking method and system
CN111078839A (en) Structured processing method and processing device for referee document
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN109190615B (en) Shape-near word recognition determination method, device, computer device and storage medium
CN115269842A (en) Intelligent label generation method and device, computer equipment and storage medium
CN108664945B (en) Image text and shape-pronunciation feature recognition method and device
WO2020258669A1 (en) Website identification method and apparatus, and computer device and storage medium
CN109101973B (en) Character recognition method, electronic device and storage medium
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
CN114943306A (en) Intention classification method, device, equipment and storage medium
CN112699949A (en) Potential user identification method and device based on social platform data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 528000 room 2002, block A, 33 Jihua five road, Chancheng District, Foshan, Guangdong.

Patentee after: Xu Qing

Patentee after: Foshan Guofang Identification Technology Co.,Ltd.

Patentee after: Foshan Guofang Software Technology Co.,Ltd.

Address before: 528000 room 2002, block A, 33 Jihua five road, Chancheng District, Foshan, Guangdong.

Patentee before: Xu Qing

Patentee before: FOSHAN GUOFANG TRADEMARK SERVICE Co.,Ltd.

Patentee before: FOSHAN GUOFANG TRADEMARK IDENTIFICATION TECHNOLOGY Co.,Ltd.