CN110866086A - Article matching system - Google Patents

Article matching system Download PDF

Info

Publication number
CN110866086A
CN110866086A CN201811644858.6A CN201811644858A CN110866086A CN 110866086 A CN110866086 A CN 110866086A CN 201811644858 A CN201811644858 A CN 201811644858A CN 110866086 A CN110866086 A CN 110866086A
Authority
CN
China
Prior art keywords
image
label
module
text
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811644858.6A
Other languages
Chinese (zh)
Inventor
郝汉
赵晓晨
杨胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anne Full Copyright Technology Development Co Ltd
Original Assignee
Beijing Anne Full Copyright Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anne Full Copyright Technology Development Co Ltd filed Critical Beijing Anne Full Copyright Technology Development Co Ltd
Priority to CN201811644858.6A priority Critical patent/CN110866086A/en
Publication of CN110866086A publication Critical patent/CN110866086A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an article matching system. The method comprises the following steps: the label extraction module is used for extracting image labels of the image files in the image resource library and storing the image labels in association with the image files; the label labeling module is used for performing adaptive screening and weighted value distribution of the image labels on the plurality of image labels of each image file; the label index module is used for establishing a mapping relation between the image label and the image file and searching the image file related to the image label according to the image label; the text abstract extracting module is used for importing the text file and extracting a text abstract in the text file; the abstract and label matching module is used for matching the text abstract and the image label with similar meaning words or equivalent meaning words. The invention integrates various advanced artificial intelligence processing algorithms and combines big data technology, realizes high-efficiency literature illustration recommendation and helps the author to find out proper illustration resources.

Description

Article matching system
Technical Field
The invention relates to the field of information technology application, in particular to a system for matching an article with a drawing.
Background
Big data is a new concept emerging with explosive growth of information data and rapid development of network computing technology. The big data is a data set with large scale which greatly exceeds the capability range of the traditional database software tools in the aspects of acquisition, storage, management and analysis, and has the four characteristics of large data scale, rapid data circulation, various data types and low value density. The big data can help enterprises in various industries to dig out the requirements of users from the original worthless mass data, so that the data can change from quantitative to qualitative, and the value is really generated. With the development of big data, the application of the big data has penetrated into various aspects of agriculture, industry, business, service industry, medical field and the like, and becomes an important factor influencing the development of the industry.
Natural language processing techniques are widely used in life, such as machine translation, handwriting and print character recognition, text conversion after speech recognition, information retrieval, extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like. The technology respectively applies the technologies of syntactic analysis, semantic analysis, discourse understanding and the like in natural language processing, and is the leading research field of the artificial intelligence field.
At present, many self-media people write articles or editors often encounter the trouble of matching pictures when writing articles, on one hand, partial copyrighted pictures are very expensive (such as visual Chinese and eastern IC), on the other hand, pictures with unknown sources are too afraid to be used, so that matching pictures by using a mode of combining artificial intelligence and big data becomes a future trend, the artificial intelligence is used for intelligently selecting pictures with proper scenes through text sentences of the articles, and the big data is used for recommending the copyrighted and cheap pictures to editors for selection.
Disclosure of Invention
The invention aims to provide an article matching system, which can efficiently aggregate related picture resources by integrating various advanced artificial intelligence processing algorithms and perform related extraction on the meaning and theme of an article, thereby realizing efficient literature illustration recommendation and effectively helping creators to find appropriate illustration resources.
In order to achieve the above object, the present invention provides an article matching system, including:
the system comprises a tag extraction module, a tag labeling module, a tag index module, a text abstract extraction module and an abstract and tag matching module;
the label extraction module is used for extracting an image label of each image file in an image resource library and storing the image label in association with the image file, wherein one image label is associated with at least one image file;
the label labeling module is used for performing adaptive screening and weight value distribution of image labels on a plurality of image labels of each image file;
the label indexing module is used for establishing a mapping relation between the image label and the image file and searching the image file related to the image label according to the image label;
the text abstract extracting module is used for importing a text file and extracting a text abstract in the text file;
the abstract and label matching module is used for matching the text abstract and the image label with similar meaning words or equivalent meaning words.
Preferably, the label extraction module extracts image labels from image files in the image resource library, extracts a plurality of keywords from each image file through an image classifier or manually, and stores the keywords in association with the image file as image labels.
Preferably, the image classifier is an image classification tool developed based on opencv vision library, random forest algorithm and logistic regression algorithm.
Preferably, the label labeling module performs adaptive screening on the image labels and the image file and assigns different weight values to each label according to the matching degree of the image labels and the image file.
Preferably, the tag indexing module searches image files stored in association with the image tags in the image repository according to the image tags, and sorts the retrieved image files according to the weight values of the tags.
Preferably, the system further comprises an aggregation module, and the aggregation module performs related index aggregation on the picture according to the image tag, so that tag labeling can be performed on the picture quickly under the condition of manual labeling.
Preferably, the abstract extracting module extracts a limited plurality of keywords describing the text file for an article by using a text extracting algorithm and matches the image tag using the keywords as a text abstract.
Preferably, the text extraction algorithm is a natural language processing algorithm.
Preferably, the abstract and tag matching module performs synonym or isonym matching on the text abstract and the image tags in the image resource library, calls the tag indexing module to search for image files associated with the image tags successfully matched with the synonym or isonym, performs image file sorting according to the weight values of the tag information, and then pushes the image files to a front-end user.
Preferably, the image resource library collects and updates image resources based on a crawler technology, wherein each image file in the image resource library further comprises copyright information and price information.
The invention has the beneficial effects that: the method extracts the text abstract of the submitted article, matches the text abstract with the image tags marked in advance on the pictures in the image resource library, matches the images through indexes, sorts the images according to the weight values of the image tags of the pictures, pushes the related resource information of the pictures to users, integrates various advanced artificial intelligence processing algorithms and combines a big data technology, realizes high-efficiency aggregation of related picture resources, performs related extraction on the text meaning, realizes high-efficiency literature illustration recommendation, and can effectively help creators to find appropriate illustration resources.
The apparatus of the present invention has other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the invention.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, wherein like reference numerals generally represent like parts in exemplary embodiments of the present invention.
FIG. 1 shows a schematic diagram of an article mapping system, according to an embodiment of the invention.
Description of reference numerals:
1. a tag extraction module; 2. a label labeling module; 3. a tag indexing module; 4. a text abstract extracting module; 5. a digest and tag matching module; 6. a polymerization module; 7. an image repository; 8. a front-end user.
Detailed Description
The invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
An article matching system according to the present invention comprises:
the system comprises a tag extraction module, a tag labeling module, a tag index module, a text abstract extraction module and an abstract and tag matching module;
the label extraction module is used for extracting an image label of each image file in the image resource library and storing the image label and the image file in a correlation manner, wherein one image label is at least associated with one image file;
the label labeling module is used for performing adaptive screening and weighted value distribution of the image labels on the plurality of image labels of each image file;
the label index module is used for establishing a mapping relation between the image label and the image file and searching the image file related to the image label according to the image label;
the text abstract extracting module is used for importing the text file and extracting a text abstract in the text file;
the abstract and label matching module is used for matching the text abstract and the image label with similar meaning words or equivalent meaning words.
Specifically, feature tag extraction and tag labeling are carried out on all image files in the image resource library, so that the image files in the image resource library are easy to retrieve, text information in the text files is abstracted, matched with tags of the images and then found out a plurality of image files matched with the text information in the image resource library through indexes, and the matched image files are sorted according to weight values pre-allocated to the tags, so that matching recommendation of the articles is realized.
In one example, the label extraction module performs image label extraction on image files in the picture resource library, and extracts a plurality of keywords from each image file as image labels and stores the image labels in association with the image files through an image classifier or manually.
In one example, the image classifier is an image classification tool that includes development based on opencv visual libraries, random forest algorithms, and logistic regression algorithms.
Specifically, the tag extraction module mainly performs preliminary tag information extraction on picture resources, extracts a plurality of keywords, such as smile of world famous painting Mona Lisa, from one picture by using a related image feature extraction technology, and can abstract the keywords into a woman, a beauty, a sitting position and a smile, and the process can be processed by using an image classifier and can also manually label the image; the image classifier belongs to an existing image classification tool in the field of artificial intelligence, and can be realized based on algorithms such as an opencv visual library, random forests, logistic regression and the like, and can classify pictures through machine learning.
In one example, the label labeling module performs adaptive screening on the image labels and the image file and assigns different weight values to each label according to the matching degree of the image labels and the image file.
Specifically, the label labeling module mainly filters labels and establishes a weight attribute system of the labels, for example, in the case that some labels in the system are repeatedly covered, such as women, including beauty, the label labeling module performs adaptive filtering of the labels at this stage, indexes the pictures in a more appropriate manner as much as possible, and additionally, gives a certain weight score to each label of each image so as to perform operations such as sorting the index results.
In one example, the tag indexing module searches image files stored in association with the image tags in the image repository according to the image tags, and sorts the retrieved image files according to the weight values of the tags.
Specifically, the label indexing module is mainly used for designing the image indexing, and mainly relates to operations such as weight sorting, distributed calculation, data synchronization, indexing result buffering and the like during indexing, and the operations are mainly used for optimizing the usability and usability of the system, so that a user can obtain information wanted by the user within an acceptable range.
In one example, the system further comprises an aggregation module, and the aggregation module performs related index aggregation on the picture according to the image tag, so that tag labeling can be performed on the picture quickly under the condition of manual labeling.
Specifically, the aggregation module mainly performs aggregation integration of similar pictures, so that staff can conveniently and quickly label the pictures. In one example, the abstract extraction module extracts a limited number of keywords describing a text file for an article using a text extraction algorithm and matches the image tags using the keywords as text abstracts.
Specifically, the abstract extraction module mainly extracts information abstract of a text, extracts a limited plurality of keywords describing the text from the text by using a relevant artificial intelligence algorithm, and then performs label matching by using the keywords.
In one example, the text extraction algorithm is a natural language processing algorithm.
Specifically, Natural Language Processing (NLP) is to process, understand and use human languages (such as chinese and english) by a computer, and belongs to a branch of artificial intelligence, and includes syntax analysis, semantic analysis, chapter comprehension, etc., and application scenarios of natural language processing include machine translation, handwriting and print character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and viewpoint mining, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, etc., related to language processing.
In one example, the abstract and tag matching module performs synonym or isonym matching on the text abstract and image tags in the image resource library, calls the tag indexing module to search image files associated with image tags successfully matched with the synonym or isonym, performs image file sorting according to the weight values of the image tags, and then pushes the image files to a front-end user.
Specifically, the abstract and tag matching module matches the abstract information with the existing tags of the picture library, and after matching of the short-term words or the equinym words is successful, the related tags are used for picture data indexing.
In one example, an image repository collects, updates image resources based on crawler technology, where each image file in the image repository also includes copyright information and price information.
Specifically, a large number of images are acquired through a big data crawler technology to build an image resource library, and copyright information and price information of sources are labeled for the image files so as to be convenient for a user to select.
Example (b):
FIG. 1 shows a schematic diagram of an article mapping system, according to an embodiment of the invention.
As shown in fig. 1, an article matching system according to the present invention includes:
the system comprises a tag extraction module 1, a tag labeling module 2, a tag index module 3, a text abstract extraction module 4 and an abstract and tag matching module 5; the label extraction module 1 is used for extracting an image label of each image file in the image resource library 7 and storing the image label in association with the image file, wherein one image label is associated with at least one image file; the label labeling module 2 is used for performing adaptive screening and weight value distribution of image labels on a plurality of image labels of each image file; the label indexing module 3 is used for establishing a mapping relation between the image labels and the image files and searching the image files related to the image labels according to the image labels; the text abstract extracting module 4 is used for importing the text file and extracting a text abstract in the text file; the abstract and label matching module 5 is used for matching the text abstract and the image label with similar meaning words or equivalent meaning words. The label extraction module 1 extracts image labels from image files in the image resource library, extracts a plurality of keywords from each image file through an image classifier or manually, and stores the keywords in association with the image files. The image classifier may select any advanced artificial intelligence image classifier, such as those developed based on opencv visual libraries, random forest algorithms, and logistic regression algorithms. The label labeling module 2 performs adaptive screening with the image file on the image labels and allocates different weight values to each label according to the matching degree of the image labels and the image file. The label indexing module 3 searches the image files stored in association with the image labels in the image resource library 7 according to the image labels, and sorts the indexed image files according to the weight values of the labels. The system further comprises an aggregation module 6, and the aggregation module 6 performs related index aggregation on the pictures according to the image tags, so that tag labeling can be performed on the pictures rapidly under the condition of manual labeling. The abstract extraction module extracts a limited plurality of keywords describing a text file for an article by using a natural language processing algorithm and matches the keywords with image tags using the keywords as text abstracts. The abstract and label matching module 5 matches the text abstract and the image labels in the image resource library with similar meaning words or equivalent meaning words, calls the label index module 3 to search image files associated with the image labels successfully matched with the similar meaning words or equivalent meaning words, sorts the image files according to the weight values of the image labels, and then pushes the image files to a front-end user 8. The image resource library 7 collects and updates image resources based on a crawler technology, wherein each image file in the image resource library 7 further comprises copyright information and price information, so that editors can conveniently select pictures with copyright and low price.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (10)

1. An article mapping system, comprising:
the system comprises a tag extraction module, a tag labeling module, a tag index module, a text abstract extraction module and an abstract and tag matching module;
the label extraction module is used for extracting an image label of each image file in an image resource library and storing the image label in association with the image file, wherein one image label is associated with at least one image file;
the label labeling module is used for performing adaptive screening and weight value distribution of image labels on a plurality of image labels of each image file;
the label indexing module is used for establishing a mapping relation between the image label and the image file and searching the image file related to the image label according to the image label;
the text abstract extracting module is used for importing a text file and extracting a text abstract in the text file;
the abstract and label matching module is used for matching the text abstract and the image label with similar meaning words or equivalent meaning words.
2. The article matching system of claim 1, wherein the label extraction module extracts image labels from image files in a picture resource library, extracts a plurality of keywords from each image file through an image classifier or manually, and stores the keywords in association with the image file.
3. The article mapping system of claim 4, wherein the image classifier is an image classification tool including an image developed based on opencv visual library, random forest algorithm, and logistic regression algorithm.
4. The article matching system of claim 1, wherein the label labeling module performs adaptive screening of the image labels with image files and assigns different weight values to each label according to matching degrees of the image labels with the image files.
5. The article mapping system of claim 1, wherein the tag indexing module searches the image repository for image files stored in association with the image tags according to the image tags, and sorts the retrieved image files according to the weight values of the tags.
6. The article matching system as claimed in claim 1, further comprising an aggregation module, wherein the aggregation module performs related index aggregation of the images according to the image tags, so as to rapidly label the images under the condition of manual labeling.
7. The system for matching articles as claimed in claim 1, wherein said abstract extracting module extracts a limited number of keywords describing said text file for an article by using a text extraction algorithm and matches said image tags using said keywords as text abstract.
8. The system for matching articles of claim 7 wherein said text extraction algorithm is a natural language processing algorithm.
9. The article matching system of claim 1, wherein the abstract and tag matching module performs synonym or isonym matching on the text abstract and the image tags in the image resource library, calls the tag indexing module to search for image files associated with the image tags successfully matched with the synonyms or isonyms, performs image file ordering according to a weight value labeled with the tag information, and pushes the image files to a front-end user.
10. The article mapping system of claim 1, wherein the image resource library collects and updates image resources based on a crawler technology, wherein each image file in the image resource library further comprises copyright information and price information.
CN201811644858.6A 2018-12-29 2018-12-29 Article matching system Pending CN110866086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811644858.6A CN110866086A (en) 2018-12-29 2018-12-29 Article matching system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811644858.6A CN110866086A (en) 2018-12-29 2018-12-29 Article matching system

Publications (1)

Publication Number Publication Date
CN110866086A true CN110866086A (en) 2020-03-06

Family

ID=69651605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811644858.6A Pending CN110866086A (en) 2018-12-29 2018-12-29 Article matching system

Country Status (1)

Country Link
CN (1) CN110866086A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613293A (en) * 2020-12-29 2021-04-06 北京中科闻歌科技股份有限公司 Abstract generation method and device, electronic equipment and storage medium
CN112784079A (en) * 2020-12-31 2021-05-11 深圳市汇深网信息科技有限公司 Picture text making method and device, electronic equipment and storage medium
WO2023045710A1 (en) * 2021-09-27 2023-03-30 北京字节跳动网络技术有限公司 Multimedia display and matching methods and apparatuses, device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239535A (en) * 2014-09-22 2014-12-24 重庆邮电大学 Method and system for matching pictures with characters, server and terminal
CN105512326A (en) * 2015-12-23 2016-04-20 成都品果科技有限公司 Picture recommending method and system
CN106446950A (en) * 2016-09-27 2017-02-22 腾讯科技(深圳)有限公司 Image processing method and device
CN107391703A (en) * 2017-07-28 2017-11-24 北京理工大学 The method for building up and system of image library, image library and image classification method
CN108597003A (en) * 2018-04-20 2018-09-28 腾讯科技(深圳)有限公司 A kind of article cover generation method, device, processing server and storage medium
CN108733779A (en) * 2018-05-04 2018-11-02 百度在线网络技术(北京)有限公司 The method and apparatus of text figure
CN108984657A (en) * 2018-06-28 2018-12-11 Oppo广东移动通信有限公司 Image recommendation method and apparatus, terminal, readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239535A (en) * 2014-09-22 2014-12-24 重庆邮电大学 Method and system for matching pictures with characters, server and terminal
CN105512326A (en) * 2015-12-23 2016-04-20 成都品果科技有限公司 Picture recommending method and system
CN106446950A (en) * 2016-09-27 2017-02-22 腾讯科技(深圳)有限公司 Image processing method and device
CN107391703A (en) * 2017-07-28 2017-11-24 北京理工大学 The method for building up and system of image library, image library and image classification method
CN108597003A (en) * 2018-04-20 2018-09-28 腾讯科技(深圳)有限公司 A kind of article cover generation method, device, processing server and storage medium
CN108733779A (en) * 2018-05-04 2018-11-02 百度在线网络技术(北京)有限公司 The method and apparatus of text figure
CN108984657A (en) * 2018-06-28 2018-12-11 Oppo广东移动通信有限公司 Image recommendation method and apparatus, terminal, readable storage medium storing program for executing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613293A (en) * 2020-12-29 2021-04-06 北京中科闻歌科技股份有限公司 Abstract generation method and device, electronic equipment and storage medium
CN112784079A (en) * 2020-12-31 2021-05-11 深圳市汇深网信息科技有限公司 Picture text making method and device, electronic equipment and storage medium
WO2023045710A1 (en) * 2021-09-27 2023-03-30 北京字节跳动网络技术有限公司 Multimedia display and matching methods and apparatuses, device and medium

Similar Documents

Publication Publication Date Title
CN110298033B (en) Keyword corpus labeling training extraction system
US5404295A (en) Method and apparatus for utilizing annotations to facilitate computer retrieval of database material
CN111581354A (en) FAQ question similarity calculation method and system
TW200426622A (en) Method and apparatus for content representation and retrieval in concept model space
CN111191022B (en) Commodity short header generation method and device
CN108549723B (en) Text concept classification method and device and server
CN110866086A (en) Article matching system
CN108038099B (en) Low-frequency keyword identification method based on word clustering
CN111611356A (en) Information searching method and device, electronic equipment and readable storage medium
CN111309916B (en) Digest extracting method and apparatus, storage medium, and electronic apparatus
Armouty et al. Automated keyword extraction using support vector machine from Arabic news documents
Shetty et al. Auto text summarization with categorization and sentiment analysis
CN111291168A (en) Book retrieval method and device and readable storage medium
CN115292450A (en) Data classification field knowledge base construction method based on information extraction
CN110990003B (en) API recommendation method based on word embedding technology
CN111104437A (en) Test data unified retrieval method and system based on object model
CN112148886A (en) Method and system for constructing content knowledge graph
JP2005301856A (en) Method and program for document retrieval, and document retrieving device executing the same
Desai et al. Automatic text summarization using supervised machine learning technique for Hindi langauge
CN114493783A (en) Commodity matching method based on double retrieval mechanism
CN112380848B (en) Text generation method, device, equipment and storage medium
CN111475607A (en) Web data clustering method based on Mashup service function characteristic representation and density peak detection
Khritankov et al. Discovering text reuse in large collections of documents: A study of theses in history sciences
CN115146030A (en) Official document writing method and system based on knowledge graph
JPH05233719A (en) Between-composite information relevance identifying method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination