CN108921193B - Picture input method, server and computer storage medium - Google Patents
Picture input method, server and computer storage medium Download PDFInfo
- Publication number
- CN108921193B CN108921193B CN201810525540.XA CN201810525540A CN108921193B CN 108921193 B CN108921193 B CN 108921193B CN 201810525540 A CN201810525540 A CN 201810525540A CN 108921193 B CN108921193 B CN 108921193B
- Authority
- CN
- China
- Prior art keywords
- picture
- grabbing
- pictures
- rule
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a picture input method, which comprises the following steps: the method comprises the steps of receiving a picture grabbing request, starting a picture grabbing task to carry out picture asynchronous grabbing, storing the grabbed pictures into a first data set, obtaining picture attribute information and picture characteristics, carrying out preliminary classification on the pictures, preliminarily marking the pictures by using the picture attribute information as label information, selecting a first picture in the first data set, selecting a plurality of other pictures which are close to the pictures in picture characteristics, obtaining a plurality of fitting coefficients of the first picture, constructing labels of the first picture by using the labels of the other pictures according to the fitting coefficients of the first picture, and marking the first picture again through the labels. The invention also provides a server and a computer readable storage medium. The picture input method, the server and the computer readable storage medium provided by the invention can be used for efficiently and quickly classifying and labeling the acquired pictures.
Description
Technical Field
The invention relates to the technical field of picture identification, in particular to a picture input method, a server and a computer storage medium.
Background
The basic pictures used for general picture recognition have a problem of scarce sources, for example, the basic pictures used for general picture recognition are automatically recorded into respective data platforms by using units, and the recorded information is single. In addition, a great deal of manual classification and labeling is required for the base picture before identification. In most projects, 70% of the time is spent on data acquisition and labeling, and much time and labor are wasted. And there are operational errors and inefficiencies in manual labeling and sorting.
Therefore, how to obtain a large number of pictures quickly and efficiently classify and label the pictures becomes a next problem that needs to be solved urgently.
Disclosure of Invention
In view of this, the present invention provides a picture recording method, a server and a computer storage medium, so as to solve the problem of how to obtain a large number of pictures quickly and perform efficient classification and labeling on the pictures.
Firstly, in order to achieve the above object, the present invention provides a picture recording method, which comprises the steps of:
receiving a picture grabbing request, starting a picture grabbing task, wherein the grabbing task comprises a main grabbing process, the main grabbing process analyzes the mapping relation between the grabbing request and a preset picture grabbing rule, and starts a plurality of grabbing sub-processes to carry out picture asynchronous grabbing according to the mapping relation, and the grabbing sub-processes correspond to a picture grabbing model established based on the preset picture grabbing rule;
storing the captured pictures into a first data set, acquiring picture attribute information and picture characteristics of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information;
selecting a first picture in the first data set, selecting a plurality of other pictures which are similar to the first picture in picture characteristics in the first data set, and fitting the picture characteristics of the pictures by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the first picture;
constructing a label of the first picture by using labels of the other pictures according to the fitting coefficients of the first picture, and labeling the first picture again by the constructed labels; and
performing distributed storage on the classified and twice-labeled pictures according to the classification result;
wherein, the preset picture capturing rule comprises:
the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule;
a second grabbing rule, which is to use regular matching to grab a range, and establish a second grabbing model based on the second grabbing rule; and
and a third grabbing rule, wherein the third grabbing rule grabs the specified page element, and a third grabbing model is established based on the third grabbing rule.
Preferably, in the process of capturing the picture, the method further comprises a step of simulating manual access to deal with the capture prevention limitation of the target website, and the step of simulating manual access specifically comprises the following steps:
finding out hidden information for logging in the target website, and storing the content of the hidden information, wherein the hidden information is information required for logging in the target website;
submitting the hidden information to simulate a login website; and
and after the simulated login is successful, obtaining the logged-in information, and capturing the picture of the target website according to the preset picture capturing rule.
Preferably, the main process is further configured to monitor the number of the image capture tasks in the plurality of capture sub-processes, when a new image capture task arrives, the main process distributes the new task to the sub-process in which the number of the image capture tasks in the plurality of capture sub-processes is smaller than a preset value, and when the number of the image capture tasks in all the capture sub-processes is larger than the preset value, the main process newly creates a sub-process and distributes the new task to the newly created sub-process.
Preferably, the method for selecting a plurality of similar other pictures comprises:
extracting the picture features of each picture in the first data set;
calculating the distance between the features of the current picture and the remaining pictures; and
selecting a preset number of pictures with the minimum distance as the preset number of nearest neighbor pictures of the given picture;
wherein the current picture is a randomly or sequentially selected picture.
Preferably, the feature is a color histogram feature, a texture feature or a shape feature, and the distance is a euclidean distance.
Preferably, obtaining a plurality of fitting coefficients for the picture comprises the steps of:
calculating a correlation matrix C with the size of k multiplied by k, wherein the elements of the m-th row and the n-th column in the matrix are as follows: cmn = (Xi-Xi) m )*(Xi-Xi n ),m,n=1,....,k;
Solving the linear system C x W =1 to obtain a fitting coefficient vector W; and
normalizing each coefficient of the fitting coefficient vector W;
the feature corresponding to the current picture is xi, the features of the k nearest neighbor images are { Xil, … Xik }, and the fitting coefficient vector is W = { W1, ·, wk }.
Preferably, in order to obtain labels of all pictures in the first data set, the method further comprises the steps of:
randomly or sequentially selecting one picture in the first data set;
fitting the labels of the selected picture with corresponding fitting coefficients using the labels of a plurality of other pictures corresponding to the selected picture; and
repeating the above steps until a label is constructed for each picture in the first data set.
In addition, in order to achieve the above object, the present invention further provides a server, which includes a memory, a processor, and a picture entry system stored on the memory and operable on the processor, wherein the picture entry system, when executed by the processor, implements the steps of the picture entry method as described above.
Further, to achieve the above object, the present invention also provides a computer readable storage medium storing a picture entry system, which is executable by at least one processor to cause the at least one processor to perform the steps of the picture entry method as described above.
Compared with the prior art, the image input method, the server and the computer readable storage medium provided by the invention firstly receive an image capture request, start an image capture task, wherein the capture task comprises a capture main process, the capture main process analyzes the mapping relation between the capture request and a preset image capture rule, and start a plurality of capture sub-processes to carry out asynchronous image capture according to the mapping relation, and the capture sub-processes correspond to an image capture model established based on the preset image capture rule; secondly, storing the captured pictures into a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information; selecting a picture in the first data set, selecting a plurality of other pictures which are similar to the picture in picture characteristics in the first data set, fitting the picture characteristics of the picture by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the picture, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again by using the constructed label; and finally, performing distributed storage on the classified and labeled pictures according to the classification result. By adopting the picture input method, the server and the computer readable storage medium provided by the invention, the pictures on the network can be quickly obtained, and the obtained pictures can be efficiently and quickly classified and labeled, so that the manpower and material resources are greatly reduced, the cost is greatly saved, and the method is more convenient, quick and accurate compared with the prior art.
Drawings
FIG. 1 is a schematic diagram of an alternative hardware architecture for a server according to the present invention;
FIG. 2 is a schematic view of program modules of a first embodiment of the picture entry system of the present invention;
FIG. 3 is a schematic flow chart of a first embodiment of the image input method according to the present invention;
FIG. 4 is a flowchart illustrating a second embodiment of a method for image entry according to the present invention;
fig. 5 is a flowchart illustrating a picture recording method according to a third embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 is a schematic diagram of an alternative hardware architecture of the server 1 according to the present invention.
In this embodiment, the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus. It is noted that fig. 1 only shows the server 1 with components 11-13, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The server 1 may be a rack server, a blade server, a tower server, or a rack server, and the server 1 may be an independent server or a server cluster formed by a plurality of servers.
The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the server 1, such as a hard disk or a memory of the server 1. In other embodiments, the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the server 1. Of course, the memory 11 may also comprise both an internal storage unit of the server 1 and an external storage device thereof. In this embodiment, the memory 11 is generally used for storing an operating system installed in the server 1 and various types of application software, such as program codes of the picture-taking system 2. Furthermore, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the server 1. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the picture recording system 2.
The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is generally used for establishing communication connection between the server 1 and other electronic devices.
The hardware structure and functions of the related devices of the present invention have been described in detail so far. Various embodiments of the present invention will be presented based on the above description.
First, the present invention provides a picture recording system 2.
Fig. 2 is a block diagram of a first embodiment of the picture recording system 2 according to the present invention.
In this embodiment, the picture-entry system 2 includes a series of computer program instructions stored on the memory 11, which when executed by the processor 12, can implement the picture-entry operations of the embodiments of the present invention. In some embodiments, the picture entry system 2 may be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 3, the picture-entry system 2 can be divided into a picture-taking module 21, a first label classification module 22, a second label classification module 23 and a storage module 24. Wherein:
the image capturing module 21 is configured to receive an image capturing request, start an image capturing task, where the capturing task includes a capturing main process, the capturing main process analyzes a mapping relationship between the capturing request and a preset image capturing rule, and starts a plurality of capturing sub processes according to the mapping relationship to perform asynchronous capturing of images, where the capturing sub processes correspond to an image capturing model established based on the preset image capturing rule;
specifically, the capture request is input by a user, and the user can select different modes to capture the picture on the internet according to different requirements, for example, the user can specify a website for capturing the picture, and capture the existing picture on a webpage corresponding to the specified website; the user can also use the Regular matching search range website to capture pictures in the search range defined by the Regular Expression, wherein the Regular Expression is also called a Regular Expression, the english name is Regular Expression, and the Regular Expression is often abbreviated as regex, regexp or RE in the code, which is a concept of computer science. The regular table is typically used to retrieve, replace, text that conforms to a certain pattern (rule). Regular expressions are a logical formula for operating on character strings (including common characters (e.g., letters between a and z) and special characters (called meta characters)), and a "regular character string" is formed by using specific characters defined in advance and a combination of the specific characters, and is used for expressing a filtering logic for the character string. A regular expression is a text pattern that describes one or more strings to be matched when searching for text, for example, a regular expression matching a complete domain name may be: ?
For example: www.baidu.com, the regular expression of the matching web address can be:
^(?=^.{3,255}$)(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+(:\d+)*(\/\w+\.\w+)*$
the regular expression for matching the http url may be:
^(?=^.{3,255}$)(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+(:\d+)*(\/\w+\.\w+)*([\?&]\w+=\w*)*$
for example: http:// www.tetet.com/index. Htmlq =1 and m = test.
The regular expressions are written according to DNS regulations, and according to the DNS regulations, labels in the domain names are all composed of English letters and numbers, each label does not exceed 63 characters, and the upper and lower case letters are not distinguished. The punctuation marks other than the hyphen (-) cannot be used in the labels. The domain name with the lowest rank is written to the far left, and the domain name with the highest rank is written to the far right. The complete domain name, which is composed of multiple labels, does not exceed 255 characters in total. The use of regular expressions is merely an example and will not be described in detail herein.
Specifically, the user may also specify page elements for grabbing. The web page is composed of individual web page elements, for example, navigation, website logo, advertisement bar, picture, text, animation, ornament, hyperlink, etc., and these various elements constitute a complete web page, and the individual web pages are the most indispensable parts in the internet.
Specifically, the preset picture capture rule includes:
the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule;
a second grabbing rule, which is to use regular matching to grab a range, and establish a second grabbing model based on the second grabbing rule; and
and a third grabbing rule, wherein the third grabbing rule grabs the specified page elements, and a third grabbing model is established based on the third grabbing rule.
Specifically, the image capture model is established corresponding to the preset image capture rule, for example, corresponding to the preset image capture rule: 1. grabbing according to a specified URL; 2. using regular matching to perform range grabbing; 3. page elements are designated for grabbing. And the specified URL image capture model, the regular matching image capture model and the specified element image capture model are respectively established by capturing the specified page elements in sequence.
Specifically, in the process of capturing pictures, when some websites have some capturing limitations, for example, a login is needed to view a webpage, we may set a simulated manual access step, which may include:
1. finding hidden information of a login website, storing the content of the hidden information, specifically, entering a developer tool, manually logging in for one time, and finding a data segment of data in the hidden information, which is information required by login;
2. submitting the information and simulating to log in a website;
3. and after the simulation login is successful, acquiring the logged information.
Specifically, the main process is further configured to monitor the number of the image capturing tasks in each sub-process, when a new image capturing task arrives, the main process distributes the new task to the sub-processes in which the number of the image capturing tasks is smaller than a preset value, and when the number of the image capturing tasks of all the sub-processes is larger than the preset value, the main process newly builds a sub-process and distributes the new task to the newly built sub-process.
The first labeling and classifying module 22 is configured to store the captured pictures in a first data set, acquire picture attribute information of the pictures in the first data set, preliminarily classify the pictures according to the picture attribute information, and preliminarily label the pictures by using the picture attribute information as label information.
Specifically, the picture attribute information includes: the time, the place, the picture name, etc. classify the time and the place of the picture generation, the picture can be classified according to the time and the place, for example, the picture can be classified according to three ways of different years, different months and different dates, and the picture can be classified according to the country, the province, the city, the district, the county, etc. The picture attribute information is stored in a picture, the picture attribute information can be read by writing a picture attribute reading program, and the step of obtaining the picture attribute information comprises the following steps: 1, loading picture information; 2, analyzing and filtering the information of the picture to acquire picture attribute information of the picture; and 3, outputting the picture attribute information of the picture.
Specifically, the obtained image attribute information may be screened, and the screened image attribute information is used as tag information to preliminarily label the image, for example, time, place, and image name in the image may be selected to label the image. The classification of pictures is one of the main methods for labeling pictures, and since a picture can be labeled with a plurality of class labels, the labeling of pictures based on classification is a multi-label picture classification problem. In addition, the picture classification can also be used for automatic filing of the pictures, so that intra-class retrieval is realized, and the query efficiency is improved.
The second labeling and classifying module 23 is configured to select a picture in the first data set, select a plurality of other pictures similar to the picture in picture characteristics in the first data set, obtain a plurality of fitting coefficients of the picture by fitting the picture characteristics of the pictures with the picture characteristics of the other pictures, construct a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and label the picture again by using the constructed label.
Specifically, a picture is usually associated with some text description information, such as a title, a subject word, comment information, and the like, to indicate information such as the content of the picture, a shooting location, personal feelings, and evaluations. Therefore, the pictures can be labeled based on the information, or the subject words can be directly used as the labels.
It should be noted that, in the pictures captured from the web, a part of the pictures contains tags, and a part of the pictures does not contain tags, and it is a central idea of the method to tag the pictures without tags by using similar pictures with tags.
The storage module 24 is configured to perform distributed storage on the classified and labeled pictures according to a classification result.
In particular, distributed storage of pictures according to different categories may facilitate picture management and searching, for example. For example, the picture attribute information includes: time, place, picture name, etc., the time and place of picture generation can be classified, pictures can be classified according to time and place, for example, pictures can be classified according to three ways of different years, different months and different dates, and pictures can be classified according to country, province, city, district, county, etc.
In addition, the invention also provides a picture input method.
Fig. 3 is a schematic flow chart of a picture recording method according to a first embodiment of the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 5 may be changed and some steps may be omitted according to different requirements.
Step S110, receiving a picture capturing request, starting a picture capturing task, wherein the capturing task comprises a capturing main process, the capturing main process analyzes the mapping relation between the capturing request and a preset picture capturing rule, and starts a plurality of capturing sub-processes to perform picture asynchronous capturing according to the mapping relation, and the capturing sub-processes correspond to a picture capturing model established based on the preset picture capturing rule.
Specifically, the capture request is input by a user, and the user can capture the picture on the internet in different ways according to different needs, for example, the user can specify a website for capturing the picture, the user can also use a regular matching search range website to capture the picture in the search range defined by the regular expression, and the user can also specify a page element to capture the picture. Wherein, the page elements can be designated for recursive grabbing, and the page elements can be designated for grabbing in sequence.
Step S120, storing the captured pictures in a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information.
In particular, picture classification is one of the main methods for labeling pictures, and since a picture can be labeled with multiple category labels, the classification-based picture labeling is a multi-label picture classification problem. In addition, the image classification can also be used for automatic filing of the images, so that intra-class retrieval is realized, and the query efficiency is improved.
Step S130, selecting a picture in the first data set, selecting a plurality of other pictures similar to the picture in picture characteristics in the first data set, obtaining a plurality of fitting coefficients of the picture by fitting the picture characteristics of the picture with the picture characteristics of the other pictures, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again through the constructed label.
Specifically, obtaining a plurality of fitting coefficients of the picture comprises the steps of:
the plurality of fitting coefficients for a given picture are obtained by minimizing the error in fitting the given picture by a plurality of other pictures that are similar in picture characteristics to the picture.
The following description will be given of the steps of obtaining the fitting coefficients, taking a given image and its k nearest neighbor images as examples:
assuming that the corresponding feature of the current image is xi, the features of k nearest neighbor images are { Xil, … Xik }, and the fitting coefficient vector is W = { W1, ·, wk }.
1, calculating a correlation matrix C with the size of k multiplied by k, wherein the elements of the m-th row and the n-th column in the matrix are as follows: cmn = (Xi-Xi) m )*(Xi-Xi n ),m,n=1,....,k。
And 2, solving the linear system C W =1 to obtain a fitting coefficient vector W. Solving the linear equation to obtain a fitting coefficient;
the individual coefficients of the fitting coefficient vector W are normalized, i.e. the value of each element in the fitting coefficient vector W is divided by the sum of all these elements. And step S140, performing distributed storage on the classified and labeled pictures according to the classification result.
Specifically, in order to obtain labels of all pictures in the first data set, the method further includes the steps of:
1, randomly or sequentially selecting one picture in the picture set;
2, fitting the labels of the selected picture with corresponding fitting coefficients by using the labels of a plurality of other pictures corresponding to the selected picture;
and 3, repeating the step 1 and the step 2 until a label is constructed for each picture in the picture set.
And step S140, performing distributed storage on the classified and labeled pictures according to the classification result.
In particular, distributed storage of pictures according to different categories may facilitate picture management and searching, for example. For example, the picture attribute information includes: the time, the place, the picture name, etc. classify the time and the place of the picture generation, the picture can be classified according to the time and the place, for example, the picture can be classified according to three ways of different years, different months and different dates, and the picture can be classified according to the country, the province, the city, the district, the county, etc.
Fig. 4 is a schematic flow chart of a picture entering method according to a second embodiment of the present invention. In this embodiment, in step S110 of the picture entry method, the step of specifying the preset picture capture rule includes:
step S210, snatching according to the appointed URL.
Specifically, the user may designate a website for capturing the picture, and capture the existing picture on a webpage corresponding to the designated website.
Step S220, range grabbing is performed by using regular matching.
Specifically, the search range website is matched regularly, and the search range limited by the regular expression is subjected to image capture.
In step S230, a page element is designated for grabbing.
Specifically, a page element is specified for grabbing. The web page is composed of individual web page elements, for example, navigation, website logo, advertisement bar, picture, text, animation, ornament, hyperlink, etc., and these various elements constitute a complete web page, and the individual web pages are the most indispensable parts in the internet.
Fig. 5 is a schematic flow chart of a picture recording method according to a third embodiment of the present invention. In this embodiment, the method for selecting a plurality of similar other pictures in the step 130 of the picture entry method includes the steps of:
step S310, extracting features of each picture in the first data set.
Specifically, the picture features can be selected and calculated by methods in the prior art, for example, color histogram features, texture features or shape features can be selected.
Step S320, calculating the distance between the features of the current picture and the remaining pictures.
Specifically, the distance of the picture feature can be selected and calculated by methods in the prior art, for example, the euclidean distance can be selected.
In step S330, a preset number of pictures with the smallest distance are selected as the preset number of nearest neighbor pictures of the given picture.
Specifically, a preset number of pictures with the smallest distance are selected as the preset number of nearest neighbor pictures of a given picture, and the purpose of selecting the picture with the smallest distance is to select the picture with the largest similarity.
The invention provides a picture input method, a server and a computer readable storage medium.A picture capture request is received at first, a picture capture task is started, the capture task comprises a capture main process, the capture main process analyzes the mapping relation between the capture request and a preset picture capture rule, and a plurality of capture subprocesses are started to carry out picture asynchronous capture according to the mapping relation, wherein the capture subprocesses correspond to a picture capture model established based on the preset picture capture rule; secondly, storing the captured pictures into a first data set, acquiring picture attribute information of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information; thirdly, selecting a picture in the first data set, selecting a plurality of other pictures which are similar to the picture in picture characteristics in the first data set, obtaining a plurality of fitting coefficients of the picture by fitting the picture characteristics of the pictures with the picture characteristics of the other pictures, constructing a label of the picture by using the labels of the other pictures according to the fitting coefficients of the picture, and labeling the picture again through the constructed label; and finally, performing distributed storage on the classified and labeled pictures according to the classification result. By adopting the picture input method, the server and the computer readable storage medium provided by the invention, the pictures on the network can be quickly obtained, and the obtained pictures can be efficiently and quickly classified and labeled, so that the manpower and material resources are greatly reduced, the cost is greatly saved, and the method is more convenient, quick and accurate compared with the prior art.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (8)
1. A picture input method is applied to a server, and is characterized by comprising the following steps:
receiving a picture grabbing request, starting a picture grabbing task, wherein the grabbing task comprises a main grabbing process, the main grabbing process analyzes the mapping relation between the grabbing request and a preset picture grabbing rule, and starts a plurality of sub grabbing processes according to the mapping relation to carry out picture asynchronous grabbing, and the sub grabbing processes correspond to a picture grabbing model established based on the preset picture grabbing rule;
storing the captured pictures into a first data set, acquiring picture attribute information and picture characteristics of the pictures in the first data set, preliminarily classifying the pictures according to the picture attribute information, and preliminarily labeling the pictures by using the picture attribute information as label information;
selecting a first picture in the first data set, selecting a plurality of other pictures which are similar to the first picture in picture characteristics in the first data set, and fitting the picture characteristics of the pictures by using the picture characteristics of the other pictures to obtain a plurality of fitting coefficients of the first picture; constructing a label of the first picture by using labels of the other pictures according to the fitting coefficients of the first picture, and labeling the first picture again by the constructed labels; and
performing distributed storage on the classified and twice-labeled pictures according to the classification result;
wherein, the preset picture capturing rule comprises: the method comprises the steps that a first grabbing rule is used for grabbing according to a specified URL, and a first grabbing model is established based on the first grabbing rule; a second capture rule, which is to use regular matching to capture the range, and establish a second capture model based on the second capture rule; and a third grabbing rule, wherein the third grabbing rule grabs the specified page elements and establishes a third grabbing model based on the third grabbing rule.
2. The picture entering method according to claim 1, wherein in the picture capturing process, a step of simulating manual access to cope with anti-capture restriction of the target website is further included, and the step of simulating manual access specifically includes:
finding out hidden information for logging in the target website, and storing the content of the hidden information, wherein the hidden information is information required for logging in the target website;
submitting the hidden information to simulate logging in a website; and
and after the simulated login is successful, obtaining the logged-in information, and capturing the picture of the target website according to the preset picture capturing rule.
3. The picture inputting method of claim 2, wherein the main process is further configured to monitor the number of picture grabbing tasks in the plurality of grabbing sub-processes, when a new picture grabbing task arrives, the main process distributes the new task to a sub-process, of the plurality of grabbing sub-processes, whose number of picture grabbing tasks is smaller than a preset value, and when the picture grabbing tasks of all grabbing sub-processes are larger than the preset value, the main process newly creates a sub-process and distributes the new task to the newly created sub-process.
4. A picture entry method as claimed in any one of claims 1 to 3, wherein the method of selecting a plurality of other pictures which are close together is:
extracting the picture features of each picture in the first data set;
calculating the distance between the features of the current picture and the remaining pictures; and
selecting a preset number of pictures with the minimum distance as the preset number of nearest neighbor pictures of the given picture;
wherein the current picture is a randomly or sequentially selected picture.
5. A picture entry method as claimed in claim 4, wherein the feature is a colour histogram feature, a texture or a shape feature and the distance is a Euclidean distance.
6. A picture entry method as claimed in claim 5, further comprising, in order to obtain labels for all pictures in the first data set, the steps of:
randomly or sequentially selecting one picture in the first data set;
fitting the labels of the selected picture with corresponding fitting coefficients using the labels of a plurality of other pictures corresponding to the selected picture; and
repeating the above steps until a label is constructed for each picture in the first data set.
7. A server, characterized in that it comprises a memory, a processor and a picture-entry system stored on said memory and executable on said processor, said picture-entry system, when executed by said processor, implementing the steps of the picture-entry method according to any one of claims 1 to 6.
8. A computer-readable storage medium storing a picture entry system executable by at least one processor to cause the at least one processor to perform the steps of the picture entry method as claimed in any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525540.XA CN108921193B (en) | 2018-05-28 | 2018-05-28 | Picture input method, server and computer storage medium |
PCT/CN2018/102077 WO2019227705A1 (en) | 2018-05-28 | 2018-08-24 | Image entry method, server and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525540.XA CN108921193B (en) | 2018-05-28 | 2018-05-28 | Picture input method, server and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108921193A CN108921193A (en) | 2018-11-30 |
CN108921193B true CN108921193B (en) | 2023-04-18 |
Family
ID=64419549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810525540.XA Active CN108921193B (en) | 2018-05-28 | 2018-05-28 | Picture input method, server and computer storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108921193B (en) |
WO (1) | WO2019227705A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144416A (en) * | 2019-12-25 | 2020-05-12 | 中国联合网络通信集团有限公司 | Information processing method and device |
CN111125489B (en) * | 2019-12-25 | 2023-05-26 | 北京锐安科技有限公司 | Data grabbing method, device, equipment and storage medium |
CN111178250B (en) * | 2019-12-27 | 2024-01-12 | 深圳市越疆科技有限公司 | Object identification positioning method and device and terminal equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103645939A (en) * | 2013-11-29 | 2014-03-19 | 北京奇虎科技有限公司 | Method and system for capturing images |
WO2017016160A1 (en) * | 2015-07-30 | 2017-02-02 | 北京奇虎科技有限公司 | Classification-based storage method for target picture, and corresponding terminal |
CN106528702A (en) * | 2016-10-26 | 2017-03-22 | 朱育盼 | Diary generation method and apparatus |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7941009B2 (en) * | 2003-04-08 | 2011-05-10 | The Penn State Research Foundation | Real-time computerized annotation of pictures |
CN106599051B (en) * | 2016-11-15 | 2020-02-07 | 北京航空航天大学 | Automatic image annotation method based on generated image annotation library |
-
2018
- 2018-05-28 CN CN201810525540.XA patent/CN108921193B/en active Active
- 2018-08-24 WO PCT/CN2018/102077 patent/WO2019227705A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103645939A (en) * | 2013-11-29 | 2014-03-19 | 北京奇虎科技有限公司 | Method and system for capturing images |
WO2017016160A1 (en) * | 2015-07-30 | 2017-02-02 | 北京奇虎科技有限公司 | Classification-based storage method for target picture, and corresponding terminal |
CN106528702A (en) * | 2016-10-26 | 2017-03-22 | 朱育盼 | Diary generation method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
WO2019227705A1 (en) | 2019-12-05 |
CN108921193A (en) | 2018-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476227B (en) | Target field identification method and device based on OCR and storage medium | |
US9626555B2 (en) | Content-based document image classification | |
CN112016273B (en) | Document catalog generation method, device, electronic equipment and readable storage medium | |
US20190188729A1 (en) | System and method for detecting counterfeit product based on deep learning | |
CN108921193B (en) | Picture input method, server and computer storage medium | |
CN110352427B (en) | System and method for collecting data associated with fraudulent content in a networked environment | |
JP2008276766A (en) | Form automatic filling method and device | |
CN106156794B (en) | Character recognition method and device based on character style recognition | |
US20210019511A1 (en) | Systems and methods for extracting data from an image | |
CN114005126A (en) | Table reconstruction method and device, computer equipment and readable storage medium | |
CN114548059A (en) | Method and device for managing structured data, storage medium and electronic equipment | |
CN111078871A (en) | Method and system for automatically classifying contracts based on artificial intelligence | |
US9466003B2 (en) | System and method for using an image to provide search results | |
CN108170838B (en) | Topic evolution visualization display method, application server and computer readable storage medium | |
CN111177387A (en) | User list information processing method, electronic device and computer readable storage medium | |
CN114491134B (en) | Trademark registration success rate analysis method and system | |
CN113177392B (en) | Method for synchronizing row segment information in proofreading interface, computing device and storage medium | |
TWM607472U (en) | Text section labeling system | |
JP4677750B2 (en) | Document attribute acquisition method and apparatus, and recording medium recording program | |
TWI787651B (en) | Method and system for labeling text segment | |
CN113268193B (en) | Notebook page moving method, electronic equipment and computer storage medium | |
CN111931515B (en) | Contract term effectiveness analysis method and device based on contract dispute judgment | |
CN114219985B (en) | Information identification processing method, information identification processing device, computer equipment and storage medium | |
CN111046064B (en) | Method for acquiring book copyright information, electronic equipment and computer storage medium | |
CN117851777A (en) | Quick establishment and arrangement method for meteorological media resource tags |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |