CN108921193B - Picture input method, server and computer storage medium - Google Patents

Picture input method, server and computer storage medium Download PDF

Info

Publication number
CN108921193B
CN108921193B CN201810525540.XA CN201810525540A CN108921193B CN 108921193 B CN108921193 B CN 108921193B CN 201810525540 A CN201810525540 A CN 201810525540A CN 108921193 B CN108921193 B CN 108921193B
Authority
CN
China
Prior art keywords
picture
grabbing
pictures
rule
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810525540.XA
Other languages
Chinese (zh)
Other versions
CN108921193A (en
Inventor
张师琲
侯丽
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810525540.XA priority Critical patent/CN108921193B/en
Priority to PCT/CN2018/102077 priority patent/WO2019227705A1/en
Publication of CN108921193A publication Critical patent/CN108921193A/en
Application granted granted Critical
Publication of CN108921193B publication Critical patent/CN108921193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a picture input method, which comprises the following steps: the method comprises the steps of receiving a picture grabbing request, starting a picture grabbing task to carry out picture asynchronous grabbing, storing the grabbed pictures into a first data set, obtaining picture attribute information and picture characteristics, carrying out preliminary classification on the pictures, preliminarily marking the pictures by using the picture attribute information as label information, selecting a first picture in the first data set, selecting a plurality of other pictures which are close to the pictures in picture characteristics, obtaining a plurality of fitting coefficients of the first picture, constructing labels of the first picture by using the labels of the other pictures according to the fitting coefficients of the first picture, and marking the first picture again through the labels. The invention also provides a server and a computer readable storage medium. The picture input method, the server and the computer readable storage medium provided by the invention can be used for efficiently and quickly classifying and labeling the acquired pictures.

Description

Picture input method, server and computer storage medium
Technical Field
The invention relates to the technical field of picture identification, in particular to a picture input method, a server and a computer storage medium.
Background
The basic pictures used for general picture recognition have a problem of scarce sources, for example, the basic pictures used for general picture recognition are automatically recorded into respective data platforms by using units, and the recorded information is single. In addition, a great deal of manual classification and labeling is required for the base picture before identification. In most projects, 70% of the time is spent on data acquisition and labeling, and much time and labor are wasted. And there are operational errors and inefficiencies in manual labeling and sorting.
Therefore, how to obtain a large number of pictures quickly and efficiently classify and label the pictures becomes a next problem that needs to be solved urgently.
Disclosure of Invention
In view of this, the present invention provides a picture recording method, a server and a computer storage medium, so as to solve the problem of how to obtain a large number of pictures quickly and perform efficient classification and labeling on the pictures.
Firstly, in order to achieve the above object, the present invention provides a picture recording method, which comprises the steps of:
receiving a picture grabbing request, starting a picture grabbing task, wherein the grabbing task comprises a main grabbing process, the main grabbing process analyzes the mapping relation between the grabbing request and a preset picture grabbing rule, and starts a plurality of grabbing sub-processes to carry out picture asynchronous grabbing according to the mapping relation, and the grabbing sub-processes correspond to a picture grabbing model established based on the preset picture grabbing rule;
storing the captured pictures into a first data set, acquiring picture attribute information and picture characteristics of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information;
selecting a first picture in the first data set, selecting a plurality of other pictures which are similar to the first picture in picture characteristics in the first data set, and fitting the picture characteristics of the pictures by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the first picture;
constructing a label of the first picture by using labels of the other pictures according to the fitting coefficients of the first picture, and labeling the first picture again by the constructed labels; and
performing distributed storage on the classified and twice-labeled pictures according to the classification result;
wherein, the preset picture capturing rule comprises:
the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule;
a second grabbing rule, which is to use regular matching to grab a range, and establish a second grabbing model based on the second grabbing rule; and
and a third grabbing rule, wherein the third grabbing rule grabs the specified page element, and a third grabbing model is established based on the third grabbing rule.
Preferably, in the process of capturing the picture, the method further comprises a step of simulating manual access to deal with the capture prevention limitation of the target website, and the step of simulating manual access specifically comprises the following steps:
finding out hidden information for logging in the target website, and storing the content of the hidden information, wherein the hidden information is information required for logging in the target website;
submitting the hidden information to simulate a login website; and
and after the simulated login is successful, obtaining the logged-in information, and capturing the picture of the target website according to the preset picture capturing rule.
Preferably, the main process is further configured to monitor the number of the image capture tasks in the plurality of capture sub-processes, when a new image capture task arrives, the main process distributes the new task to the sub-process in which the number of the image capture tasks in the plurality of capture sub-processes is smaller than a preset value, and when the number of the image capture tasks in all the capture sub-processes is larger than the preset value, the main process newly creates a sub-process and distributes the new task to the newly created sub-process.
Preferably, the method for selecting a plurality of similar other pictures comprises:
extracting the picture features of each picture in the first data set;
calculating the distance between the features of the current picture and the remaining pictures; and
selecting a preset number of pictures with the minimum distance as the preset number of nearest neighbor pictures of the given picture;
wherein the current picture is a randomly or sequentially selected picture.
Preferably, the feature is a color histogram feature, a texture feature or a shape feature, and the distance is a euclidean distance.
Preferably, obtaining a plurality of fitting coefficients for the picture comprises the steps of:
calculating a correlation matrix C with the size of k multiplied by k, wherein the elements of the m-th row and the n-th column in the matrix are as follows: cmn = (Xi-Xi) m )*(Xi-Xi n ),m,n=1,....,k;
Solving the linear system C x W =1 to obtain a fitting coefficient vector W; and
normalizing each coefficient of the fitting coefficient vector W;
the feature corresponding to the current picture is xi, the features of the k nearest neighbor images are { Xil, … Xik }, and the fitting coefficient vector is W = { W1, ·, wk }.
Preferably, in order to obtain labels of all pictures in the first data set, the method further comprises the steps of:
randomly or sequentially selecting one picture in the first data set;
fitting the labels of the selected picture with corresponding fitting coefficients using the labels of a plurality of other pictures corresponding to the selected picture; and
repeating the above steps until a label is constructed for each picture in the first data set.
In addition, in order to achieve the above object, the present invention further provides a server, which includes a memory, a processor, and a picture entry system stored on the memory and operable on the processor, wherein the picture entry system, when executed by the processor, implements the steps of the picture entry method as described above.
Further, to achieve the above object, the present invention also provides a computer readable storage medium storing a picture entry system, which is executable by at least one processor to cause the at least one processor to perform the steps of the picture entry method as described above.
Compared with the prior art, the image input method, the server and the computer readable storage medium provided by the invention firstly receive an image capture request, start an image capture task, wherein the capture task comprises a capture main process, the capture main process analyzes the mapping relation between the capture request and a preset image capture rule, and start a plurality of capture sub-processes to carry out asynchronous image capture according to the mapping relation, and the capture sub-processes correspond to an image capture model established based on the preset image capture rule; secondly, storing the captured pictures into a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information; selecting a picture in the first data set, selecting a plurality of other pictures which are similar to the picture in picture characteristics in the first data set, fitting the picture characteristics of the picture by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the picture, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again by using the constructed label; and finally, performing distributed storage on the classified and labeled pictures according to the classification result. By adopting the picture input method, the server and the computer readable storage medium provided by the invention, the pictures on the network can be quickly obtained, and the obtained pictures can be efficiently and quickly classified and labeled, so that the manpower and material resources are greatly reduced, the cost is greatly saved, and the method is more convenient, quick and accurate compared with the prior art.
Drawings
FIG. 1 is a schematic diagram of an alternative hardware architecture for a server according to the present invention;
FIG. 2 is a schematic view of program modules of a first embodiment of the picture entry system of the present invention;
FIG. 3 is a schematic flow chart of a first embodiment of the image input method according to the present invention;
FIG. 4 is a flowchart illustrating a second embodiment of a method for image entry according to the present invention;
fig. 5 is a flowchart illustrating a picture recording method according to a third embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 is a schematic diagram of an alternative hardware architecture of the server 1 according to the present invention.
In this embodiment, the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus. It is noted that fig. 1 only shows the server 1 with components 11-13, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The server 1 may be a rack server, a blade server, a tower server, or a rack server, and the server 1 may be an independent server or a server cluster formed by a plurality of servers.
The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the server 1, such as a hard disk or a memory of the server 1. In other embodiments, the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the server 1. Of course, the memory 11 may also comprise both an internal storage unit of the server 1 and an external storage device thereof. In this embodiment, the memory 11 is generally used for storing an operating system installed in the server 1 and various types of application software, such as program codes of the picture-taking system 2. Furthermore, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the server 1. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the picture recording system 2.
The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is generally used for establishing communication connection between the server 1 and other electronic devices.
The hardware structure and functions of the related devices of the present invention have been described in detail so far. Various embodiments of the present invention will be presented based on the above description.
First, the present invention provides a picture recording system 2.
Fig. 2 is a block diagram of a first embodiment of the picture recording system 2 according to the present invention.
In this embodiment, the picture-entry system 2 includes a series of computer program instructions stored on the memory 11, which when executed by the processor 12, can implement the picture-entry operations of the embodiments of the present invention. In some embodiments, the picture entry system 2 may be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 3, the picture-entry system 2 can be divided into a picture-taking module 21, a first label classification module 22, a second label classification module 23 and a storage module 24. Wherein:
the image capturing module 21 is configured to receive an image capturing request, start an image capturing task, where the capturing task includes a capturing main process, the capturing main process analyzes a mapping relationship between the capturing request and a preset image capturing rule, and starts a plurality of capturing sub processes according to the mapping relationship to perform asynchronous capturing of images, where the capturing sub processes correspond to an image capturing model established based on the preset image capturing rule;
specifically, the capture request is input by a user, and the user can select different modes to capture the picture on the internet according to different requirements, for example, the user can specify a website for capturing the picture, and capture the existing picture on a webpage corresponding to the specified website; the user can also use the Regular matching search range website to capture pictures in the search range defined by the Regular Expression, wherein the Regular Expression is also called a Regular Expression, the english name is Regular Expression, and the Regular Expression is often abbreviated as regex, regexp or RE in the code, which is a concept of computer science. The regular table is typically used to retrieve, replace, text that conforms to a certain pattern (rule). Regular expressions are a logical formula for operating on character strings (including common characters (e.g., letters between a and z) and special characters (called meta characters)), and a "regular character string" is formed by using specific characters defined in advance and a combination of the specific characters, and is used for expressing a filtering logic for the character string. A regular expression is a text pattern that describes one or more strings to be matched when searching for text, for example, a regular expression matching a complete domain name may be: ?
For example: www.baidu.com, the regular expression of the matching web address can be:
^(?=^.{3,255}$)(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+(:\d+)*(\/\w+\.\w+)*$
the regular expression for matching the http url may be:
^(?=^.{3,255}$)(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+(:\d+)*(\/\w+\.\w+)*([\?&]\w+=\w*)*$
for example: http:// www.tetet.com/index. Htmlq =1 and m = test.
The regular expressions are written according to DNS regulations, and according to the DNS regulations, labels in the domain names are all composed of English letters and numbers, each label does not exceed 63 characters, and the upper and lower case letters are not distinguished. The punctuation marks other than the hyphen (-) cannot be used in the labels. The domain name with the lowest rank is written to the far left, and the domain name with the highest rank is written to the far right. The complete domain name, which is composed of multiple labels, does not exceed 255 characters in total. The use of regular expressions is merely an example and will not be described in detail herein.
Specifically, the user may also specify page elements for grabbing. The web page is composed of individual web page elements, for example, navigation, website logo, advertisement bar, picture, text, animation, ornament, hyperlink, etc., and these various elements constitute a complete web page, and the individual web pages are the most indispensable parts in the internet.
Specifically, the preset picture capture rule includes:
the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule;
a second grabbing rule, which is to use regular matching to grab a range, and establish a second grabbing model based on the second grabbing rule; and
and a third grabbing rule, wherein the third grabbing rule grabs the specified page elements, and a third grabbing model is established based on the third grabbing rule.
Specifically, the image capture model is established corresponding to the preset image capture rule, for example, corresponding to the preset image capture rule: 1. grabbing according to a specified URL; 2. using regular matching to perform range grabbing; 3. page elements are designated for grabbing. And the specified URL image capture model, the regular matching image capture model and the specified element image capture model are respectively established by capturing the specified page elements in sequence.
Specifically, in the process of capturing pictures, when some websites have some capturing limitations, for example, a login is needed to view a webpage, we may set a simulated manual access step, which may include:
1. finding hidden information of a login website, storing the content of the hidden information, specifically, entering a developer tool, manually logging in for one time, and finding a data segment of data in the hidden information, which is information required by login;
2. submitting the information and simulating to log in a website;
3. and after the simulation login is successful, acquiring the logged information.
Specifically, the main process is further configured to monitor the number of the image capturing tasks in each sub-process, when a new image capturing task arrives, the main process distributes the new task to the sub-processes in which the number of the image capturing tasks is smaller than a preset value, and when the number of the image capturing tasks of all the sub-processes is larger than the preset value, the main process newly builds a sub-process and distributes the new task to the newly built sub-process.
The first labeling and classifying module 22 is configured to store the captured pictures in a first data set, acquire picture attribute information of the pictures in the first data set, preliminarily classify the pictures according to the picture attribute information, and preliminarily label the pictures by using the picture attribute information as label information.
Specifically, the picture attribute information includes: the time, the place, the picture name, etc. classify the time and the place of the picture generation, the picture can be classified according to the time and the place, for example, the picture can be classified according to three ways of different years, different months and different dates, and the picture can be classified according to the country, the province, the city, the district, the county, etc. The picture attribute information is stored in a picture, the picture attribute information can be read by writing a picture attribute reading program, and the step of obtaining the picture attribute information comprises the following steps: 1, loading picture information; 2, analyzing and filtering the information of the picture to acquire picture attribute information of the picture; and 3, outputting the picture attribute information of the picture.
Specifically, the obtained image attribute information may be screened, and the screened image attribute information is used as tag information to preliminarily label the image, for example, time, place, and image name in the image may be selected to label the image. The classification of pictures is one of the main methods for labeling pictures, and since a picture can be labeled with a plurality of class labels, the labeling of pictures based on classification is a multi-label picture classification problem. In addition, the picture classification can also be used for automatic filing of the pictures, so that intra-class retrieval is realized, and the query efficiency is improved.
The second labeling and classifying module 23 is configured to select a picture in the first data set, select a plurality of other pictures similar to the picture in picture characteristics in the first data set, obtain a plurality of fitting coefficients of the picture by fitting the picture characteristics of the pictures with the picture characteristics of the other pictures, construct a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and label the picture again by using the constructed label.
Specifically, a picture is usually associated with some text description information, such as a title, a subject word, comment information, and the like, to indicate information such as the content of the picture, a shooting location, personal feelings, and evaluations. Therefore, the pictures can be labeled based on the information, or the subject words can be directly used as the labels.
It should be noted that, in the pictures captured from the web, a part of the pictures contains tags, and a part of the pictures does not contain tags, and it is a central idea of the method to tag the pictures without tags by using similar pictures with tags.
The storage module 24 is configured to perform distributed storage on the classified and labeled pictures according to a classification result.
In particular, distributed storage of pictures according to different categories may facilitate picture management and searching, for example. For example, the picture attribute information includes: time, place, picture name, etc., the time and place of picture generation can be classified, pictures can be classified according to time and place, for example, pictures can be classified according to three ways of different years, different months and different dates, and pictures can be classified according to country, province, city, district, county, etc.
In addition, the invention also provides a picture input method.
Fig. 3 is a schematic flow chart of a picture recording method according to a first embodiment of the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 5 may be changed and some steps may be omitted according to different requirements.
Step S110, receiving a picture capturing request, starting a picture capturing task, wherein the capturing task comprises a capturing main process, the capturing main process analyzes the mapping relation between the capturing request and a preset picture capturing rule, and starts a plurality of capturing sub-processes to perform picture asynchronous capturing according to the mapping relation, and the capturing sub-processes correspond to a picture capturing model established based on the preset picture capturing rule.
Specifically, the capture request is input by a user, and the user can capture the picture on the internet in different ways according to different needs, for example, the user can specify a website for capturing the picture, the user can also use a regular matching search range website to capture the picture in the search range defined by the regular expression, and the user can also specify a page element to capture the picture. Wherein, the page elements can be designated for recursive grabbing, and the page elements can be designated for grabbing in sequence.
Step S120, storing the captured pictures in a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information.
In particular, picture classification is one of the main methods for labeling pictures, and since a picture can be labeled with multiple category labels, the classification-based picture labeling is a multi-label picture classification problem. In addition, the image classification can also be used for automatic filing of the images, so that intra-class retrieval is realized, and the query efficiency is improved.
Step S130, selecting a picture in the first data set, selecting a plurality of other pictures similar to the picture in picture characteristics in the first data set, obtaining a plurality of fitting coefficients of the picture by fitting the picture characteristics of the picture with the picture characteristics of the other pictures, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again through the constructed label.
Specifically, obtaining a plurality of fitting coefficients of the picture comprises the steps of:
the plurality of fitting coefficients for a given picture are obtained by minimizing the error in fitting the given picture by a plurality of other pictures that are similar in picture characteristics to the picture.
The following description will be given of the steps of obtaining the fitting coefficients, taking a given image and its k nearest neighbor images as examples:
assuming that the corresponding feature of the current image is xi, the features of k nearest neighbor images are { Xil, … Xik }, and the fitting coefficient vector is W = { W1, ·, wk }.
1, calculating a correlation matrix C with the size of k multiplied by k, wherein the elements of the m-th row and the n-th column in the matrix are as follows: cmn = (Xi-Xi) m )*(Xi-Xi n ),m,n=1,....,k。
And 2, solving the linear system C W =1 to obtain a fitting coefficient vector W. Solving the linear equation to obtain a fitting coefficient;
the individual coefficients of the fitting coefficient vector W are normalized, i.e. the value of each element in the fitting coefficient vector W is divided by the sum of all these elements. And step S140, performing distributed storage on the classified and labeled pictures according to the classification result.
Specifically, in order to obtain labels of all pictures in the first data set, the method further includes the steps of:
1, randomly or sequentially selecting one picture in the picture set;
2, fitting the labels of the selected picture with corresponding fitting coefficients by using the labels of a plurality of other pictures corresponding to the selected picture;
and 3, repeating the step 1 and the step 2 until a label is constructed for each picture in the picture set.
And step S140, performing distributed storage on the classified and labeled pictures according to the classification result.
In particular, distributed storage of pictures according to different categories may facilitate picture management and searching, for example. For example, the picture attribute information includes: the time, the place, the picture name, etc. classify the time and the place of the picture generation, the picture can be classified according to the time and the place, for example, the picture can be classified according to three ways of different years, different months and different dates, and the picture can be classified according to the country, the province, the city, the district, the county, etc.
Fig. 4 is a schematic flow chart of a picture entering method according to a second embodiment of the present invention. In this embodiment, in step S110 of the picture entry method, the step of specifying the preset picture capture rule includes:
step S210, snatching according to the appointed URL.
Specifically, the user may designate a website for capturing the picture, and capture the existing picture on a webpage corresponding to the designated website.
Step S220, range grabbing is performed by using regular matching.
Specifically, the search range website is matched regularly, and the search range limited by the regular expression is subjected to image capture.
In step S230, a page element is designated for grabbing.
Specifically, a page element is specified for grabbing. The web page is composed of individual web page elements, for example, navigation, website logo, advertisement bar, picture, text, animation, ornament, hyperlink, etc., and these various elements constitute a complete web page, and the individual web pages are the most indispensable parts in the internet.
Fig. 5 is a schematic flow chart of a picture recording method according to a third embodiment of the present invention. In this embodiment, the method for selecting a plurality of similar other pictures in the step 130 of the picture entry method includes the steps of:
step S310, extracting features of each picture in the first data set.
Specifically, the picture features can be selected and calculated by methods in the prior art, for example, color histogram features, texture features or shape features can be selected.
Step S320, calculating the distance between the features of the current picture and the remaining pictures.
Specifically, the distance of the picture feature can be selected and calculated by methods in the prior art, for example, the euclidean distance can be selected.
In step S330, a preset number of pictures with the smallest distance are selected as the preset number of nearest neighbor pictures of the given picture.
Specifically, a preset number of pictures with the smallest distance are selected as the preset number of nearest neighbor pictures of a given picture, and the purpose of selecting the picture with the smallest distance is to select the picture with the largest similarity.
The invention provides a picture input method, a server and a computer readable storage medium.A picture capture request is received at first, a picture capture task is started, the capture task comprises a capture main process, the capture main process analyzes the mapping relation between the capture request and a preset picture capture rule, and a plurality of capture subprocesses are started to carry out picture asynchronous capture according to the mapping relation, wherein the capture subprocesses correspond to a picture capture model established based on the preset picture capture rule; secondly, storing the captured pictures into a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information; thirdly, selecting a picture in the first data set, selecting a plurality of other pictures which are similar to the picture in picture characteristics in the first data set, obtaining a plurality of fitting coefficients of the picture by fitting the picture characteristics of the pictures with the picture characteristics of the other pictures, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again through the constructed label; and finally, performing distributed storage on the classified and labeled pictures according to the classification result. By adopting the picture input method, the server and the computer readable storage medium provided by the invention, the pictures on the network can be quickly obtained, and the obtained pictures can be efficiently and quickly classified and labeled, so that the manpower and material resources are greatly reduced, the cost is greatly saved, and the method is more convenient, quick and accurate compared with the prior art.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A picture input method is applied to a server, and is characterized by comprising the following steps:
receiving a picture grabbing request, starting a picture grabbing task, wherein the grabbing task comprises a main grabbing process, the main grabbing process analyzes the mapping relation between the grabbing request and a preset picture grabbing rule, and starts a plurality of sub grabbing processes according to the mapping relation to carry out picture asynchronous grabbing, and the sub grabbing processes correspond to a picture grabbing model established based on the preset picture grabbing rule;
storing the captured pictures into a first data set, acquiring picture attribute information and picture characteristics of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information;
selecting a first picture in the first data set, selecting a plurality of other pictures which are similar to the first picture in picture characteristics in the first data set, and fitting the picture characteristics of the pictures by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the first picture; constructing a label of the first picture by using labels of the other pictures according to the fitting coefficients of the first picture, and labeling the first picture again by the constructed labels; and
performing distributed storage on the classified and twice-labeled pictures according to the classification result;
wherein, the preset picture capturing rule comprises: the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule; a second capture rule, which is to use regular matching to capture the range, and establish a second capture model based on the second capture rule; and a third grabbing rule, wherein the third grabbing rule grabs the specified page elements and establishes a third grabbing model based on the third grabbing rule.
2. The picture entering method according to claim 1, wherein in the picture capturing process, a step of simulating manual access to cope with anti-capture restriction of the target website is further included, and the step of simulating manual access specifically includes:
finding out hidden information for logging in the target website, and storing the content of the hidden information, wherein the hidden information is information required for logging in the target website;
submitting the hidden information to simulate logging in a website; and
and after the simulated login is successful, obtaining the logged-in information, and capturing the picture of the target website according to the preset picture capturing rule.
3. The picture inputting method of claim 2, wherein the main process is further configured to monitor the number of picture grabbing tasks in the plurality of grabbing sub-processes, when a new picture grabbing task arrives, the main process distributes the new task to a sub-process, of the plurality of grabbing sub-processes, whose number of picture grabbing tasks is smaller than a preset value, and when the picture grabbing tasks of all grabbing sub-processes are larger than the preset value, the main process newly creates a sub-process and distributes the new task to the newly created sub-process.
4. A picture entry method as claimed in any one of claims 1 to 3, wherein the method of selecting a plurality of other pictures which are close together is:
extracting the picture features of each picture in the first data set;
calculating the distance between the features of the current picture and the remaining pictures; and
selecting a preset number of pictures with the minimum distance as the preset number of nearest neighbor pictures of the given picture;
wherein the current picture is a randomly or sequentially selected picture.
5. A picture entry method as claimed in claim 4, wherein the feature is a colour histogram feature, a texture or a shape feature and the distance is a Euclidean distance.
6. A picture entry method as claimed in claim 5, further comprising, in order to obtain labels for all pictures in the first data set, the steps of:
randomly or sequentially selecting one picture in the first data set;
fitting the labels of the selected picture with corresponding fitting coefficients using the labels of a plurality of other pictures corresponding to the selected picture; and
repeating the above steps until a label is constructed for each picture in the first data set.
7. A server, characterized in that it comprises a memory, a processor and a picture-entry system stored on said memory and executable on said processor, said picture-entry system, when executed by said processor, implementing the steps of the picture-entry method according to any one of claims 1 to 6.
8. A computer-readable storage medium storing a picture entry system executable by at least one processor to cause the at least one processor to perform the steps of the picture entry method as claimed in any one of claims 1 to 6.
CN201810525540.XA 2018-05-28 2018-05-28 Picture input method, server and computer storage medium Active CN108921193B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810525540.XA CN108921193B (en) 2018-05-28 2018-05-28 Picture input method, server and computer storage medium
PCT/CN2018/102077 WO2019227705A1 (en) 2018-05-28 2018-08-24 Image entry method, server and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810525540.XA CN108921193B (en) 2018-05-28 2018-05-28 Picture input method, server and computer storage medium

Publications (2)

Publication Number Publication Date
CN108921193A CN108921193A (en) 2018-11-30
CN108921193B true CN108921193B (en) 2023-04-18

Family

ID=64419549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810525540.XA Active CN108921193B (en) 2018-05-28 2018-05-28 Picture input method, server and computer storage medium

Country Status (2)

Country Link
CN (1) CN108921193B (en)
WO (1) WO2019227705A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144416A (en) * 2019-12-25 2020-05-12 中国联合网络通信集团有限公司 Information processing method and device
CN111125489B (en) * 2019-12-25 2023-05-26 北京锐安科技有限公司 Data grabbing method, device, equipment and storage medium
CN111178250B (en) * 2019-12-27 2024-01-12 深圳市越疆科技有限公司 Object identification positioning method and device and terminal equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645939A (en) * 2013-11-29 2014-03-19 北京奇虎科技有限公司 Method and system for capturing images
WO2017016160A1 (en) * 2015-07-30 2017-02-02 北京奇虎科技有限公司 Classification-based storage method for target picture, and corresponding terminal
CN106528702A (en) * 2016-10-26 2017-03-22 朱育盼 Diary generation method and apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941009B2 (en) * 2003-04-08 2011-05-10 The Penn State Research Foundation Real-time computerized annotation of pictures
CN106599051B (en) * 2016-11-15 2020-02-07 北京航空航天大学 Automatic image annotation method based on generated image annotation library

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645939A (en) * 2013-11-29 2014-03-19 北京奇虎科技有限公司 Method and system for capturing images
WO2017016160A1 (en) * 2015-07-30 2017-02-02 北京奇虎科技有限公司 Classification-based storage method for target picture, and corresponding terminal
CN106528702A (en) * 2016-10-26 2017-03-22 朱育盼 Diary generation method and apparatus

Also Published As

Publication number Publication date
WO2019227705A1 (en) 2019-12-05
CN108921193A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN111476227B (en) Target field identification method and device based on OCR and storage medium
US9626555B2 (en) Content-based document image classification
CN112016273B (en) Document catalog generation method, device, electronic equipment and readable storage medium
US20190188729A1 (en) System and method for detecting counterfeit product based on deep learning
CN108921193B (en) Picture input method, server and computer storage medium
CN110352427B (en) System and method for collecting data associated with fraudulent content in a networked environment
JP2008276766A (en) Form automatic filling method and device
CN106156794B (en) Character recognition method and device based on character style recognition
US20210019511A1 (en) Systems and methods for extracting data from an image
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN114548059A (en) Method and device for managing structured data, storage medium and electronic equipment
CN111078871A (en) Method and system for automatically classifying contracts based on artificial intelligence
US9466003B2 (en) System and method for using an image to provide search results
CN108170838B (en) Topic evolution visualization display method, application server and computer readable storage medium
CN111177387A (en) User list information processing method, electronic device and computer readable storage medium
CN114491134B (en) Trademark registration success rate analysis method and system
CN113177392B (en) Method for synchronizing row segment information in proofreading interface, computing device and storage medium
TWM607472U (en) Text section labeling system
JP4677750B2 (en) Document attribute acquisition method and apparatus, and recording medium recording program
TWI787651B (en) Method and system for labeling text segment
CN113268193B (en) Notebook page moving method, electronic equipment and computer storage medium
CN111931515B (en) Contract term effectiveness analysis method and device based on contract dispute judgment
CN114219985B (en) Information identification processing method, information identification processing device, computer equipment and storage medium
CN111046064B (en) Method for acquiring book copyright information, electronic equipment and computer storage medium
CN117851777A (en) Quick establishment and arrangement method for meteorological media resource tags

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant