WO2017088701A1 - 一种海量图片管理方法和装置 - Google Patents

一种海量图片管理方法和装置 Download PDF

Info

Publication number
WO2017088701A1
WO2017088701A1 PCT/CN2016/106326 CN2016106326W WO2017088701A1 WO 2017088701 A1 WO2017088701 A1 WO 2017088701A1 CN 2016106326 W CN2016106326 W CN 2016106326W WO 2017088701 A1 WO2017088701 A1 WO 2017088701A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
full
library
image
latest
Prior art date
Application number
PCT/CN2016/106326
Other languages
English (en)
French (fr)
Inventor
张增明
陈智强
陈德品
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017088701A1 publication Critical patent/WO2017088701A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Definitions

  • the present application relates to the field of computer technology, and in particular, to a massive picture management method, and a massive picture management device.
  • the online trading platform provides a large number of merchandise transactions, each of which has at least one corresponding image. Take AliExpress as an example. There are about 150 million merchandise on the platform, and there are 1 to 6 per merchandise. In the main map of the product displayed on the search, shopping guide and other pages, there are a number of detailed drawings describing the details of the product. As the business develops, a large number of pictures are newly sent to the platform every day.
  • various processing and analysis can be performed, for example, judging whether the two products are similar or the same according to the image content, or evaluating the quality of the image based on the image content, and identifying whether the product is infringing or the like.
  • the present application has been made in order to provide a mass picture management method and a corresponding massive picture management apparatus that overcome the above problems or at least partially solve the above problems.
  • a method for managing massive pictures including:
  • the target image is extracted from the full-quantity library and fed back to the application.
  • the method before the acquiring a plurality of latest pictures updated on the current day, the method further includes:
  • the link address of the latest picture is parsed from the latest product information, and the latest picture is acquired according to the link address.
  • the saving by comparing the picture index, saving the latest picture that does not exist in the full-quantity library in the increasing gallery to the full-quantity library includes:
  • Extracting the latest image of the picture index that does not exist in the historical index library is saved to the full-quantity library.
  • the method further includes:
  • the pictures in the full-quantity library are stored in a plurality of storage areas of the server cluster according to the distribution of the multi-level picture categories, and the pictures of each storage area are sequentially stored according to the corresponding picture numbers, and each picture mark has a corresponding Image identification and multi-level picture categories;
  • extracting the target image from the full-quantity library to the application includes:
  • the target picture is extracted from the full-quantity library according to the picture categories of the multi-level picture category.
  • each day corresponds to a daily increasing gallery, and the method further comprises:
  • the method further includes:
  • the method further includes:
  • the method further includes: saving the latest image that is not present in the full-quantity library in the increasing gallery to the full-quantity library by comparing the image index, the method further includes:
  • the method further includes:
  • monitoring the network connection API when capturing the network connection API to issue a network connection abnormality notification, ending all transmission threads, and restarting the new multiple transmission threads instead of performing the corresponding tasks.
  • the extracting the target image from the full-quantity library to the application is: searching for the target image from the full-quantity library, and extracting a picture feature of the target image to the application;
  • the picture index is a picture number and a picture identifier of the picture.
  • the application also provides a massive picture management device, comprising:
  • a picture acquisition module configured to obtain a plurality of latest pictures updated on the current day
  • a picture uploading module configured to upload the latest picture in parallel to a preset increasing library in a distributed server cluster by using multiple transmission threads, where a full-scale library is also deployed in the distributed server cluster;
  • a picture saving module configured to save, by comparing the picture index, the latest picture in the increasing gallery that does not exist in the full-quantity library to the full-quantity library;
  • a picture feedback module configured to: after receiving the request for the application to call the image, extract the target image from the full-quantity library and feed back to the application.
  • the device further comprises:
  • a latest product analysis module configured to obtain updated latest product information by parsing the product update record before acquiring the plurality of latest pictures updated on the current day;
  • the link address access module is configured to parse the link address of the latest picture from the latest product information, and obtain the latest picture according to the link address.
  • the picture saving module includes:
  • An index matching sub-module configured to compare a picture index of a latest picture in the increasing gallery with a preset historical index library, where the image index of all pictures in the full-quantity library is saved in the historical index library;
  • the image extraction sub-module is configured to extract the latest image of the image index that does not exist in the historical index library and save the image to the full-quantity library.
  • the device further comprises:
  • An index adding module configured to add a picture index corresponding to the latest picture added to the full-size library to the historical index library.
  • the pictures in the full-quantity library are stored in a plurality of storage areas of the server cluster according to the distribution of the multi-level picture categories, and the pictures of each storage area are sequentially stored according to the corresponding picture numbers, and each picture mark has a corresponding Image identification and multi-level picture categories;
  • the picture feedback module includes:
  • a class analysis sub-module configured to parse the request for calling the image to carry a target multi-level picture category of the required target picture
  • the target image is extracted from the full library.
  • each day corresponds to a daily increasing gallery
  • the device further comprises:
  • the Gallery Delete module is used to delete an increasing gallery that does not match the preset time zone.
  • the device further comprises:
  • the query module is configured to determine an online picture corresponding to the item that is still used online by querying the item history access data, and/or determine an online picture that is still used online by querying the picture history calling data;
  • a picture deletion module configured to delete a picture in the full-quantity library other than the online picture.
  • the device further comprises:
  • a category finding module configured to find a picture category whose modulo value is equal to the corresponding day of the day as the picture category to be cleaned
  • the picture deletion module is specifically configured to delete, in the full-quantity library, a picture other than the online picture in the picture category for the picture category to be cleaned.
  • the device further comprises:
  • a picture replacement module configured to: when the latest picture that is not in the full-quantity library in the increasing gallery is saved to the full-quantity library by using the comparison picture index, the corresponding original picture exists in the The latest image of the full library is saved to the full gallery instead of the original image.
  • the device further comprises:
  • a timeout processing module configured to detect that a transmission thread exceeds a preset time, end the transmission thread, and restart a new transmission thread instead of executing a corresponding task
  • a network connection interruption processing module configured to monitor the network connection API, when the network connection API is notified to issue a network connection abnormality notification, end all transmission threads, and restart a new plurality of transmission threads instead of performing corresponding tasks.
  • the picture feedback module is configured to search for the target picture from the full-quantity library, and extract a picture feature of the target picture and feed back to the application;
  • the picture index is a picture number and a picture identifier of the picture.
  • the full amount of product images are stored in the full-scale library of the distributed service cluster, which satisfies the requirements for processing and analyzing the massive image and the storage capacity and data processing capability of the platform; Store to the increasing gallery, determine the new image that does not exist in the full library by comparing the image index, and add the determined new image to the full library, which avoids the inaccurate product image provided to the downstream application and occupies more storage. Problems with resources and computing resources.
  • the latest picture of the corresponding original picture exists in the full-size gallery, and may be saved to the full-size library instead of the original picture, thereby realizing updating of the old and new pictures; after extracting the latest picture required by the application, the image may be further extracted.
  • the image features are fed back to reduce the load on the image processed by the terminal where the application is located.
  • the embodiment of the present application supports storing the pictures in multiple storage areas of the server cluster according to the corresponding multi-level picture categories, and can further extract the pictures according to the multi-level categories, thereby greatly improving the efficiency of searching for data; Moreover, in each storage area, a plurality of pictures can be organized into a large file for storage according to the picture number, thereby improving the efficiency of image search and processing.
  • FIG. 1 is a flow chart showing a mass picture management method according to an embodiment of the present application
  • FIG. 2 is a flow chart showing a mass picture management method according to another embodiment of the present application.
  • FIG. 3 is a schematic flow chart showing the image transmission of the present application.
  • FIG. 4 shows a storage structure of a picture in one example of the present application
  • FIG. 5 is a schematic diagram showing a multi-level picture category in an example of the present application.
  • FIG. 6 is a schematic diagram showing the steps of picture cleaning in an example of the present application.
  • FIG. 7 is a flow chart showing the output of a picture in an example of the present application.
  • FIG. 8 is a structural block diagram of a mass picture management apparatus according to an embodiment of the present application.
  • FIG. 9 is a block diagram showing the structure of a mass picture management apparatus according to another embodiment of the present application.
  • FIG. 1 a flowchart of massive picture management according to an embodiment of the present application is illustrated.
  • the method may specifically include the following steps:
  • Step 101 Acquire a plurality of latest pictures updated on the current day.
  • the latest image updated on the current day may include a modified image for the original image, or a newly added image, such as all the images of the newly added product or the newly added image for the original product.
  • the latest pictures can be obtained in a variety of ways, such as monitoring the behavior of the client to update the picture, or obtaining the latest picture by accessing the related records of the picture update, and can also be used in any other applicable manner, and the application does not limit this.
  • Step 102 Upload the latest picture in parallel to a preset increasing library in a distributed server cluster by using multiple transmission threads, and a full-scale library is also deployed in the distributed server cluster.
  • the solution of the present application can be deployed on a Hadoop system (Hadoop Distributed File System), which is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Take advantage of cluster high-speed computing and storage capabilities.
  • the core design of Hadoop's framework is: HDFS (Hadoop Distributed File System) and MapReduce.
  • HDFS provides storage for massive amounts of data.
  • HDFS Highly fault-tolerant and designed to be deployed on low-cost hardware; and it provides high throughput to access application data for large data sets
  • the application, MapReduce provides calculations for massive amounts of data.
  • a unified interface can be used to provide input for the downstream distributed image processing program.
  • Step 103 Save, by comparing the picture index, the latest picture in the increasing gallery that does not exist in the full-quantity library to the full-quantity library.
  • the image is identified by a picture index, and the picture index can be any available data such as the identification and number of the picture.
  • the steps of index matching can be completed by using a MapReduce task.
  • Step 104 After receiving the request for the application to call the image, extract the target image from the full-quantity library and feed back to the application.
  • the application can send a request to the distributed service cluster to call the image, and after receiving the request, look up the image requested by the application for feedback.
  • the application can implement the functions including the same figure, the picture quality detection, and the product content infringement detection based on the picture content, and the application does not limit here.
  • the step 103 may include:
  • Sub-step S1 comparing a picture index of the latest picture in the increasing gallery with a preset historical index library, where the picture index of all pictures in the full-quantity library is saved in the historical index library;
  • Sub-step S2 extracting the latest picture whose picture index does not exist in the historical index library is saved to the full-quantity library.
  • the application may pre-store the image index of all the pictures of the full-quantity library by using the historical index library, and determine that the latest picture of the full-size library does not exist, and the index of the picture may be compared, if the image is not found in the full-scale gallery. An image index of an image can be saved to the full gallery.
  • the method further includes:
  • the determined picture index of the latest picture may be added to the historical index library to update it.
  • the pictures in the full-quantity library may be stored in multiple storage areas of the server cluster according to the multi-level picture category, and may be extracted only according to multi-level categories when searching for pictures. , which can greatly improve the efficiency of finding data.
  • the multi-level category can be set according to actual needs, which is not limited in this application.
  • the step 104 may include:
  • Sub-step S3 parsing the request for calling the image to carry the target multi-level picture category of the desired target picture
  • Sub-step S4 extracting from the full-quantity library according to the storage location of the storage area and the picture identifier of each picture mark and the multi-level picture category of the picture category in each of the multi-level picture categories.
  • the target picture
  • Pre-configure the correspondence between the categories of the various levels and the storage location of the storage area parse the request for the application to call the image to obtain the multi-level picture category of the image to be extracted, and further extract the target image from the full-quantity library according to the corresponding storage location.
  • the image library needs to provide flexible filtering access, for example, the user may need to access the image corresponding to a certain image identifier in a certain category, so this image library does not put all the pictures together, but organizes according to the following directory.
  • the image is stored in categories according to the category, just like a partition. In this way, when only some images under a certain three-level category need to be filtered, only the data of the third-level category needs to be taken as an input, which can greatly reduce the processing amount of the data.
  • SequenceFile is a binary file format provided by Hadoop that serializes data into files as ⁇ key, value>.
  • the application is specifically applied to the application, and the pictures of each storage area may be stored in order according to the corresponding picture number.
  • the metadata may provide a data filtering function, thereby improving the efficiency of image searching and processing;
  • the picture identifier and the multi-level picture category belong to the picture, and are used to extract the picture according to the picture identifier and the multi-level picture category, where K is the picture number, V is the picture original data and metadata, and the metadata includes the picture identifier.
  • the multi-level category can be the MD5 value of the picture.
  • the extracting the target image from the full-quantity library to the application is: searching for the target image from the full-quantity library, and extracting a picture feature of the target image to the application.
  • this solution does not store the picture original data on the HDFS, but extracts the required picture features, such as a histogram, after obtaining the original data of the picture. , SIFT, etc., store these feature data on HDFS, in order to reduce the amount of data transmission.
  • the problem with this scheme is that the image library cannot be used as a general data platform for image processing and analysis tasks.
  • the image features required for each image processing task may be different and cannot be enumerated one by one. A feature, and this feature does not exist, then this image processing task cannot be performed in a short time, because the massive image extraction feature itself also requires a huge workload.
  • the algorithmic staff must start with how to get the picture, then extract the features, and then upload to HDFS before they can use the algorithm for analysis and processing.
  • Pre-feature preparation requires a lot of effort, and the algorithmic staff cannot focus on the application of the algorithm.
  • the application Since the application stores the original data of the image, the application can preset the required feature extraction mode, or for some common image features, the user can directly call the preset distributed feature extraction program.
  • the need to extract various features can be met, so that for downstream image processing and analysis tasks, the photo library of the present application can provide data services as a public data support platform.
  • the photo library of the present application can provide data services as a public data support platform.
  • the algorithm personnel do not need to care about a large number of image storage and feature extraction work, and only need to pay attention to the algorithm itself to achieve The "specialized person" strategy ensures the efficiency of the work. And by extracting picture features for feedback, the load on the image processed by the terminal where the application is located is reduced.
  • FIG. 2 a flowchart of a server intrusion identification method based on data analysis according to another embodiment of the present application is shown.
  • the method may specifically include the following steps:
  • step 201 the latest item information corresponding to the update is obtained by parsing the item update record.
  • Step 202 Parse the link address of the latest picture from the latest product information, and obtain the latest picture according to the link address.
  • the latest image information can be obtained by parsing the latest product information, and the product can be obtained from the storage location of the latest product according to the link address.
  • Step 203 Acquire a plurality of latest pictures updated on the current day.
  • Step 204 Upload the latest picture to the daily increasing gallery preset in the distributed server cluster through multiple transmission threads, corresponding to a daily increasing gallery, and a full-scale library is also deployed in the distributed server cluster.
  • Step 205 When it is detected that the execution time of a certain transmission thread exceeds the preset time, the transmission thread ends, and the new transmission thread is restarted instead of executing the corresponding task.
  • the transmission timeout problem may occur in the picture transmission, resulting in a large increase in the transmission time of the entire picture, and even a transmission failure problem, so it is necessary to perform timeout control.
  • a transmission time upper limit or a timeout period may be set in advance for each transmission thread. If the transmission thread has not ended yet, the transmission timeout problem is determined. For one or more transmission thread timeouts, forcibly shut down the corresponding transmission thread, and restart the new transmission thread, instead of the closed thread to perform the task, so as to timely discover and solve the transmission timeout problem, ensuring that it can be in the shortest time. Transfer a large number of images to a distributed server cluster.
  • Step 206 Monitor the network connection API, and when the network connection API is notified to send a network connection abnormality notification, end the transmission thread, and restart the new transmission thread instead of executing the corresponding task.
  • the picture transmission may be subject to network disturbances, causing the connection to the distributed server cluster to be broken, causing the transmission to be interrupted. Therefore, in the picture transmission process, the network connection needs to be monitored, and when the network connection is interrupted, the transmission task is retried, so that the network interruption problem can be found and solved in time, and a large number of pictures can be transmitted in the shortest time. Go to the distributed server cluster.
  • the preferred method of the present application is to end all current transmission threads and restart a corresponding number of new threads, corresponding to each of the closed transmission tasks.
  • the network interruption function can be discovered by monitoring the network connection API.
  • the network connection function is implemented by the API (Application Programming Interface) of the Java language.
  • the API Application Programming Interface
  • the API will issue an exception notification by capturing this.
  • An exception notification can be used to determine that a network outage has occurred.
  • This application can be solved by means of retry for connection interruption and timeout. Since it is impossible to perform an unrestricted retry, it is possible to control the maximum number of retries corresponding to the retry setting. For example, try to retry up to 3 times. If you can't complete the task after 3 retry attempts, ignore the transfer to this picture.
  • Step 207 Save, by comparing the picture index, the latest picture that does not exist in the full-quantity library in the increasing gallery to the full-quantity library, and replace the latest picture of the corresponding original picture in the full-quantity library.
  • the original image is saved to the full gallery.
  • the latest picture that exists in the full-quantity library for the corresponding original picture may be saved to the full-quantity library instead of the original picture, thereby implementing updating of the new and old pictures.
  • Step 208 After receiving the request for the application to call the image, the target image is extracted from the full-quantity library and fed back to the application.
  • Step 209 Delete the increasing gallery that does not meet the preset time zone.
  • the increasing library does not need to retain data for many days. You can set a period, for example, for 7 days, and delete the expired increasing gallery according to the deadline.
  • Step 210 Determine an online picture corresponding to the item still in use by querying the item history access data, and/or determine an online picture that is still used online by querying the picture history calling data.
  • Step 211 Delete a picture other than the online picture in the full quantity gallery.
  • the category A can be used as the object to be cleaned up on Tuesday.
  • B is the object to be cleaned up on Monday. If the day corresponds to Tuesday of the week, the category A is cleared.
  • the correspondence between the category ID and the 7-module result and a certain date of the week can also be set in any suitable manner, which is not limited in this application.
  • deleting the picture other than the online picture in the full-quantity library is: deleting the picture category in the full-quantity library for the picture category to be cleaned, except the online picture Outside picture.
  • the corresponding image information is constructed for the downloaded product information, firstly, the product data to be processed is obtained, and then the URL of the image is obtained through parsing of the product data, further divided into N shares, and the high reliability transmission program is called to write the image to the HDFS.
  • Each of the SequenceFiles will have a corresponding image upload unit for processing. Speed up the transfer of images by parallel uploading, and multiple image uploading units work in parallel.
  • All upload units upload the image to a temporary increase in the HDFS gallery, which stores a picture of all the items acquired in the first step.
  • the transmission program has high reliability. By deploying the transmission disconnect reconnection mechanism and the timeout control mechanism, it is guaranteed that a large number of pictures can be transmitted to the HDFS in the shortest time.
  • index comparison build a daily library.
  • the index stores the ID of the picture in the picture library and the MD5 code of the picture.
  • the picture in the temporary directory is compared with the index library to obtain the picture data in the index, and the content of the day is increased.
  • FIG. 4 shows a storage structure of a picture in one example of the present application
  • FIG. 5 shows a schematic diagram of a multi-level picture class in one example of the present application.
  • SequenceFile stores data in the K-V format.
  • K the image ID
  • V the image raw data (binary data) and metadata to form the storage structure shown in Figure 4.
  • the metadata of the picture includes the MD5 code of the picture and the category of the product corresponding to the picture, etc., and the metadata can provide data filtering function in the subsequent picture processing.
  • the image library needs to provide flexible filtering access, for example, the user may need to access the pictures corresponding to the product IDs under a certain category, so not all the pictures are put together in this picture library, but according to FIG. 5
  • the directory organization is shown, and the pictures are stored in categories according to categories, just like one partition.
  • the image01.seq, image02.seq, and image03.seq are stored in a four-level category in the root directory of the image library, so that when only some images under a certain four-level category need to be filtered, only Take the data of the four categories as input, which can greatly reduce the amount of data processing.
  • the image update includes three aspects:
  • the full-quantity library is a daily update implemented in the present invention, and the daily increasing library can be directly combined with the full-scale library.
  • This step needs to delete the "zombie picture" in the full library. Because the amount of full library data is huge, this cleaning can not be performed at one time. Therefore, for the full picture, the policy of batch cleaning according to the category, that is, clearing the category ID every day.
  • the modulo 7 is equal to the category of the day of the week, so that each category will be cleaned up within a week.
  • FIG. 6 is a schematic diagram showing the steps of image cleaning in an example of the present application, which specifically includes:
  • Step 1 Determine whether the category is cleaned up on the same day.
  • Step 2 Prepare a list of valid picture IDs.
  • Step 3 Run the MapReduce cleanup task.
  • Step 4 Flip the original data with the cleaned data.
  • FIG. 7 is a schematic flow chart of the picture output in an example of the present application, which specifically includes:
  • Step 1 Determine the list of required picture IDs.
  • the downstream program provides the required picture ID as input to the picture output step.
  • Step 2 Filter the gallery and get the image data.
  • the image list get the required image data from the gallery.
  • the distributed comparison of the picture ID and the library data is performed through the MapReduce task, and the result is obtained.
  • Step 3 Extract picture features.
  • the feature extraction method can be performed through the distributed MapReduce task task through the built-in picture feature extraction method or the downstream program-defined image feature extraction algorithm, and the extracted feature is used as the input of the downstream image processing task.
  • FIG. 8 it is a structural block diagram of a mass picture management apparatus according to an embodiment of the present application, which may specifically include:
  • the image obtaining module 301 is configured to acquire a plurality of latest pictures that are updated on the current day;
  • the image uploading module 302 is configured to upload the latest image to the daily increasing library preset in the distributed server cluster through multiple transmission threads, and the full-scale library is also deployed in the distributed server cluster;
  • the picture saving module 303 is configured to save, by using the comparison picture index, the latest picture that does not exist in the full-quantity library in the increasing image library to the full-quantity library;
  • the image feedback module 304 is configured to: after receiving the request for the application to call the image, extract the target image from the full-quantity library and feed back to the application.
  • the picture saving module includes:
  • An index matching sub-module configured to compare a picture index of a latest picture in the increasing gallery with a preset historical index library, where the image index of all pictures in the full-quantity library is saved in the historical index library;
  • a picture extraction sub-module configured to extract a picture index that does not exist in the historical index library and save the latest picture to the The full library.
  • the device further includes:
  • An index adding module configured to add a picture index corresponding to the latest picture added to the full-size library to the historical index library.
  • the pictures in the full-quantity library are stored in a plurality of storage areas of the server cluster according to the multi-level picture category, and the pictures in each storage area are sequentially stored according to the corresponding picture number.
  • Each picture is marked with a corresponding picture identifier and a multi-level picture category;
  • the picture feedback module includes:
  • a class analysis sub-module configured to parse the request for calling the image to carry a target multi-level picture category of the required target picture
  • the target image is extracted from the full library.
  • the image feedback module is specifically configured to search for the target image from the full-quantity library, and extract a picture feature of the target image to feed back to the application;
  • the picture index is a picture number and a picture identifier of the picture.
  • the full amount of product images are stored in the full-scale library of the distributed service cluster, which satisfies the requirements for processing and analyzing the massive image and the storage capacity and data processing capability of the platform; Store to the increasing gallery, determine the new image that does not exist in the full library by comparing the image index, and add the determined new image to the full library, which avoids the inaccurate product image provided to the downstream application and occupies more storage. Problems with resources and computing resources.
  • FIG. 9 it is a structural block diagram of a mass picture management apparatus according to another embodiment of the present application, which may specifically include:
  • the latest product analysis module 401 is configured to obtain the latest updated product information by parsing the product update record before acquiring the plurality of latest pictures updated on the current day;
  • the link address access module 402 is configured to parse the link address of the latest picture from the latest product information, and obtain the latest picture according to the link address.
  • a picture obtaining module 403, configured to acquire a plurality of latest pictures updated on the current day;
  • the image uploading module 404 is configured to upload the latest image to the distributed service in parallel through multiple transmission threads.
  • a preset library in the cluster, and a full-scale library is also deployed in the distributed server cluster;
  • the timeout processing module 405 is configured to end the transmission thread when the execution time of a certain transmission thread exceeds the preset time, and restart the new transmission thread instead of executing the corresponding task;
  • the network connection interruption processing module 406 is configured to monitor the network connection API. When the network connection API is notified to issue a network connection abnormality notification, all transmission threads are terminated, and a new plurality of transmission threads are restarted instead of performing the corresponding tasks.
  • the image saving module 407 is configured to save, by using a comparison image index, the latest image that does not exist in the full-quantity library in the increasing gallery to the full-quantity library, and the corresponding original image exists in the full-quantity library.
  • the latest picture is saved to the full-size gallery instead of the original picture;
  • the image feedback module 408 is configured to: after receiving the request for the application to call the image, extract the target image from the full-quantity library and feed back to the application.
  • the gallery deletion module 409 is configured to delete an increasing gallery that does not meet the preset time zone.
  • the querying module 410 is configured to determine an online picture corresponding to the item that is still used online by querying the item history access data, and/or determine an online picture that is still used online by querying the picture history calling data;
  • the picture deletion module 411 is configured to delete a picture other than the online picture in the full quantity gallery.
  • the category search module is configured to search for a certain image category whose modulo value is equal to the corresponding day of the day as the picture category to be cleaned;
  • the picture deletion module is specifically configured to delete, in the full-quantity library, a picture other than the online picture in the picture category for the picture category to be cleaned.
  • the full amount of product images are stored in the full-scale library of the distributed service cluster, which satisfies the requirements for processing and analyzing the massive image and the storage capacity and data processing capability of the platform; Store to the increasing gallery, determine the new image that does not exist in the full library by comparing the image index, and add the determined new image to the full library, which avoids the inaccurate product image provided to the downstream application and occupies more storage. Problems with resources and computing resources.
  • the latest picture of the corresponding original picture exists in the full-size gallery, and may be saved to the full-size library instead of the original picture, thereby realizing updating of the old and new pictures; after extracting the latest picture required by the application, the image may be further extracted.
  • the image features are fed back to reduce the load on the image processed by the terminal where the application is located.
  • the embodiment of the present application supports storing the pictures in multiple storage areas of the server cluster according to the corresponding multi-level picture categories, and can further extract the pictures according to the multi-level categories, thereby greatly improving the efficiency of searching for data; Moreover, in each storage area, multiple pictures can be organized into one large file according to the picture number. Storage, which improves the efficiency of image search and processing.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present application can be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of some or all of the components of the data analysis based server intrusion identification device in accordance with embodiments of the present application.
  • the application can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种海量图片管理方法和装置。所述方法包括:获取当日更新的多个最新图片(100,203);将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库(102);通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库(103);接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序(104,208)。该方法避免了提供给下游应用程序的商品图片不准确以及占用较多存储资源和计算资源的问题。

Description

一种海量图片管理方法和装置
本申请要求2015年11月27日递交的申请号为201510849675.8发明名称为“一种海量图片管理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及一种海量图片管理方法,以及一种海量图片管理装置。
背景技术
网络交易平台提供大量商品的交易,每个商品都有对应的至少一张图片,以全球速卖通(Aliexpress)为例,该平台上大约有1.5亿的商品,每个商品有1至6张在搜索、导购等页面展示的商品主图,还有多张描述商品详情的细节图,随着业务的发展,每天有大量的图片被新发到该平台。
基于图片可以进行多种处理和分析,例如从图片内容判断两个商品是否相似或者同款,或是基于图片内容评估图片的质量、识别商品是否侵权等。
目前存在的问题是,一方面,海量图片的处理和分析对平台的存储能力、数据处理能力均有着较高的要求;另一方面,针对每日更新的大量图片,由于并未标记与原始图片的关系,因此无法确切获知哪些图片为新增图片,目前的图片存储仅仅是简单将更新的图片全部并入图片库中,从而造成供下游应用程序调用的商品图片不准确,并且会浪费较多的计算资源和存储资源来处理重复的图片。
发明内容
鉴于上述问题,提出了本申请以便提供一种克服上述问题或者至少部分地解决上述问题的海量图片管理方法和相应的海量图片管理装置。
依据本申请的一个方面,提供了一种海量图片管理方法,包括:
获取当日更新的多个最新图片;
将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所 述全量图库;
接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
优选地,在所述获取当日更新的多个最新图片之前,所述方法还包括:
通过解析商品更新记录获得对应更新的最新商品信息;
从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
优选地,所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库包括:
将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
提取图片索引不存在于所述历史索引库的最新图片保存至所述全量图库。
优选地,所述方法还包括:
将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
优选地,所述全量图库中的图片按照所属多级图片类目分布存放在所述服务器集群的多个存储区,每个存储区的图片按照对应的图片编号按序存放,各图片标记有对应的图片标识和所属多级图片类目;
所述接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序包括:
解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
优选地,每日对应一个日增图库,所述方法还包括:
删除不符合预设时间区段的日增图库。
优选地,所述方法还包括:
通过查询商品历史访问数据确定仍在线使用的商品对应的在线图片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片;
删除所述全量图库中除所述在线图片之外的图片。
优选地,所述方法还包括:
查找求模值等于当天对应星期的某个图片类目作为待清理的图片类目;
所述删除所述全量图库中除所述在线图片之外的图片为,针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
优选地,在所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库的同时,所述方法还包括:
将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库。
优选地,所述方法还包括:
检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务;
和/或,监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所有传输线程,并重启新的多个传输线程代替执行相应任务。
优选地,所述从所述全量图库提取目标图片反馈至所述应用程序为,从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序;
所述图片索引为所述图片的图片编号和图片标识。
本申请还提供了一种海量图片管理装置,包括:
图片获取模块,用于获取当日更新的多个最新图片;
图片上传模块,用于将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
图片保存模块,用于图片通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库;
图片反馈模块,用于接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
优选地,所述装置还包括:
最新商品解析模块,用于在所述获取当日更新的多个最新图片之前,通过解析商品更新记录获得对应更新的最新商品信息;
链接地址访问模块,用于从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
优选地,所述图片保存模块包括:
索引比对子模块,用于将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
图片提取子模块,用于提取图片索引不存在于所述历史索引库的最新图片保存至所述全量图库。
优选地,所述装置还包括:
索引增加模块,用于将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
优选地,所述全量图库中的图片按照所属多级图片类目分布存放在所述服务器集群的多个存储区,每个存储区的图片按照对应的图片编号按序存放,各图片标记有对应的图片标识和所属多级图片类目;
所述图片反馈模块包括:
类目解析子模块,用于解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
按类目提取子模块,用于根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
优选地,每日对应一个日增图库,所述装置还包括:
图库删除模块,用于删除不符合预设时间区段的日增图库。
优选地,所述装置还包括:
查询模块,用于通过查询商品历史访问数据确定仍在线使用的商品对应的在线图片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片;
图片删除模块,用于删除所述全量图库中除所述在线图片之外的图片。
优选地,所述装置还包括:
类目查找模块,用于查找求模值等于当天对应星期的某个图片类目作为待清理的图片类目;
所述图片删除模块,具体用于针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
优选地,所述装置还包括:
图片替代模块,用于在所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库的同时,将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库。
优选地,所述装置还包括:
超时处理模块,用于检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务;
和/或,网络连接中断处理模块,用于监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所有传输线程,并重启新的多个传输线程代替执行相应任务。
优选地,所述图片反馈模块,具体用于从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序;
所述图片索引为所述图片的图片编号和图片标识。
依据本申请实施例,将全量的商品图片存储于分布式服务集群的全量图库中,满足了海量图片的处理和分析对平台的存储能力、数据处理能力的要求;针对每日更新的最新图片,存储至日增图库,通过比对图片索引确定不存在于全量图库的新增图片,将确定的新增图片增加至全量图库,避免了提供给下游应用程序的商品图片不准确以及占用较多存储资源和计算资源的问题。
本申请实施例中,针对对应的原始图片存在于全量图库的最新图片,可以替代原始图片保存至所述全量图库,从而实现新旧图片的更新;在提取应用程序所需最新图片后,可以进一步提取图片特征进行反馈,减轻了应用程序所在终端处理图片的负载。
本申请实施例支持将图片按照对应的多级图片类目存放在服务器集群的多个存储区,进一步查找图片时可以仅仅根据多级类目进行提取,从而可以极大的提高查找数据的效率;并且,在各个存储区,可以将多个图片按照图片编号组织成一个大文件进行存储,从而提高了图片查找和处理的效率。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本申请一个实施例的海量图片管理方法的流程图;
图2示出了根据本申请另一个实施例的海量图片管理方法的流程图;
图3示出了本申请图片传输的流程示意图;
图4示出了本申请的一个示例中图片的存储结构;
图5示出了本申请的一个示例中多级图片类目的示意图;
图6示出了本申请的一个示例中图片清理的步骤示意图;
图7示出了本申请的一个示例中图片输出的流程示意图;
图8示出了根据本申请一个实施例的海量图片管理装置的结构框图;
图9示出了根据本申请另一个实施例的海量图片管理装置的结构框图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
参考图1,示出了根据本申请一个实施例的海量图片管理的流程图,该方法具体可以包括以下步骤:
步骤101,获取当日更新的多个最新图片。
当日更新的最新图片可以包括针对原始图片进行修改后的图片,也可以是新增的图片,例如新增商品的所有图片或是针对原有商品新增的图片。可以通过多种方式获取最新图片,例如监控客户端更新图片的行为,或是通过访问图片更新的相关记录获取最新图片,还可以通过其他任意适用的方式,本申请对此并不做限制。
步骤102,将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库。
传统的图片存储和处理通常是在一个服务器上进行,无法满足海量图片的需求,本申请通过将存储所有图片的全量图库部署在分布式服务器集群上,可以满足海量图片存储和处理的要求。
具体实现中,优选的,可以将本申请的方案部署在Hadoop系统(Hadoop Distributed File System,分布式文件系统)上,Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群高速运算和存储能力。Hadoop的框架最核心的设计就是:HDFS(Hadoop Distributed File System,分布式文件系统)和MapReduce。HDFS为海量的数据提供了存储。HDFS 有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序,MapReduce则为海量的数据提供了计算。
Hadoop作为现在比较可靠的分布式框架,可以很方便的编写分布式程序。但是要在hadoop上分布式的处理图片,需要先把图片传输到HDFS上。随着数据量的增大,数据传输耗时增加,海量数据的上传到HDFS更是会耗费大量的时间,相比于单线程,本申请通过多线程传输可以提高数据传输的效率。
进一步,需要在HDFS上维护一个全量的图片库以及每日的日增图库,保持图库的日更新,作为分布式图片处理任务的数据输入可以采用统一的接口为下游的分布式图片处理程序提供输入。
步骤103,通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库。
图片采用图片索引进行标识,图片索引可以是图片的标识、编号等任意可用数据。
由于日增图库中可能存在全量图库中已有对应的原始图片,因此需要需要确定哪些图片是不存在于全量图库中的最新图片并保存至全量图库。
优选地,当采用Hadoop系统实施本申请时,可以采用MapReduce任务完成索引比对的步骤。
步骤104,接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
应用程序可以向分布式服务集群发送调用图片的请求,接收到请求后,查找应用程序所请求的图片进行反馈。此处应用程序可以实现包括图同款、图片质量检测和基于图片内容的商品侵权检测等功能,本申请在此不做限制。
本申请实施例中,优选地,所述步骤103可以包括:
子步骤S1,将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
子步骤S2,提取图片索引不存在于所述历史索引库的最新图片保存至所述全量图库。
本申请可以预先采用历史索引库保存全量图库的全部图片的图片索引,确定不存在于所述全量图库的最新图片时,可以通过比对图片索引,若在全量图库中未查找到日增图库中某个图片的图片索引,则可以将该图片保存至全量图库。
本申请实施例中,优选地,所述方法还包括:
将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
在确定不存在于所述全量图库的最新图片后,可以将确定的最新图片的图片索引增加至历史索引库,以对其进行更新。
本申请实施例中,优选地,所述全量图库中的图片可以按照所属多级图片类目分布存放在所述服务器集群的多个存储区,进一步查找图片时可以仅仅根据多级类目进行提取,从而可以极大的提高查找数据的效率。多级类目可以根据实际需要进行设置,本申请对此并不做限定。
相应优选地,所述步骤104可以包括:
子步骤S3,解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
子步骤S4,根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
预先配置各级类目与存储区的存放位置的对应关系,对应用程序调用图片的请求进行解析获得待提取图片的多级图片类目,进一步按照对应的存储位置从全量图库提取目标图片。
由于图片库需要提供灵活的过滤访问,比如用户可能需要访问某某类目下某个图片标识对应的图片,所以本图片库中不是将所有的图片都放在一起,而是按照如下的目录组织形式,将图片按照类目分级存放,就像一个个的分区。这样当只需要过滤得到某个三级类目下的某些图片时,只需要拿三级类目的数据作为输入即可,可极大的减少数据的处理量。
由于图片都是一个个的小文件,而众多的小文件会大大降低Hadoop平台的处理效率。采用Hadoop系统时,其文件系统的结构在处理和存储大文件有很大的优势,而很多小文件则不适合在hadoop中进行处理,可以将众多的小文件,通过使用Hadoop中提供的SequenceFile的方式,组织成一个大文件进行存储。SequenceFile是Hadoop提供的一种二进制文件格式,它将数据以<key,value>的形式序列化到文件中。具体应用到本申请,每个存储区的图片可以按照对应的图片编号按序存放,在后续的图片处理过程中这些元数据可以提供数据过滤功能,从而提高了图片查找和处理的效率;各图片可以标记有对应的图片标识和所属多级图片类目,用于按照图片标识和多级图片类目对图片进行提取,K为图片编号,V为图片原始数据和元数据,元数据包括图片标识和所属多级类目。图片标识可以是图片的MD5值。
本申请实施例中,优选地,所述从所述全量图库提取目标图片反馈至所述应用程序为,从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序。
相比于在HDFS上存储图片的特征库而不是图片原始数据的方案,这种方案不在HDFS上存储图片原始数据,而是在拿到图片的原始数据之后,提取需要的图片特征,比如直方图、SIFT等,存储这些特征数据到HDFS上,目的是为了减少数据传输量。但这种方案存在的问题是,图片库无法作为图片处理和分析任务的通用数据平台,每个图片处理任务需要的图片特征可能是不一样的,无法一一枚举,如果某个任务需要某种特征,而这种特征不存在,那么就无法在短时间内进行这个图片处理任务,因为对海量图片提取特征本身也需要巨大的工作量。并且这种方式,算法人员的工作必须从如何获取图片着手,然后提取特征,然后再上传到HDFS,之后才能使用算法进行分析和处理。前期的特征准备需要花费很大精力,算法人员无法专注在算法的应用上。
本申请由于存储了图片原始数据,应用程序可以预置所需的特征提取方式,或是,对于一些常用的图片特征,可以通过预置的通用的分布式特征提取程序,使用者可以直接调用。可以满足提取各种特征的需求,使得对于下游的图片处理和分析任务,本申请的图片库可以作为一个公共的数据支持平台提供数据服务。通过这种统一的图片输出方式以及内置的图片特征处理算法,从而可以方便快捷地为下游的图片处理任务提供数据,算法人员不需要关心大量图片存储和特征提取工作,只需要关注算法本身,实现了“专人专事”策略,保障了工作的高效性。并且通过提取图片特征进行反馈,减轻了应用程序所在终端处理图片的负载。
参考图2,示出了根据本申请另一个实施例的基于数据分析的服务器入侵识别方法的流程图,该方法具体可以包括以下步骤:
步骤201,通过解析商品更新记录获得对应更新的最新商品信息。
在商品更新时可以进行记录,后续通过读取记录即可蝴蝶更新的最新商品信息。
步骤202,从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
在获取最新商品时,可以通过解析最新商品信息获得最新图片的链接地址,根据链接地址可以从最新商品的存储位置获取该商品。
步骤203,获取当日更新的多个最新图片。
步骤204,将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,每日对应一个日增图库,所述分布式服务器集群中还部署有全量图库。
步骤205,检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务。
图片传输中可能会出现传输超时的问题,导致整个图片传输的时间大大增加,甚至发生传输失败的问题,因此需要做好超时控制。
由于采用了多个传输线程进行图片的上传,预先可以针对各个传输线程设置传输时间上限或是超时时间,超出这个时间传输线程还没有结束,则确定发生传输超时的问题。针对一个或多个传输线程超时的情况,强制关闭相应的传输线程,并重新启动新的传输线程,代替关闭的线程执行任务,从而及时发现并解决传输超时的问题,保证可以在最短的时间中将大量的图片传输到到分布式服务器集群。
步骤206,监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所述传输线程,并重启新的传输线程代替执行相应任务。
图片传输中可能会受到网络扰动,导致和分布式服务器集群的连接会断掉,使传输中断。因此,在图片传输过程中需要对网络连接进行监控,并在监控到网络连接中断时,重试传输任务,从而及时发现并解决网络中断的问题,保证可以在最短的时间中将大量的图片传输到到分布式服务器集群。
本申请优选采用的方式是,结束当前所有传输线程,并重新启动相应个数的新的线程,对应执行关闭的各个传输任务。
其中,可以通过监控网络连接API来发现网络中断,网络连接功能通过Java语言底层的API(Application Programming Interface,应用程序编程接口)来实现,当网络中断时,API会发出一个异常通知,通过捕获这个异常通知即可确定发生网络中断。
本申请针对连接中断和超时均可采用重试的方式解决。由于不可能无限制的进行重试,可以针对重试设置对应的最大重试次数进行控制。例如,最多重试3次,若重试3次之后还不能完成任务,则忽略对这张图片的传输。
步骤207,通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库,将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库。
本申请实施例中,针对对应的原始图片存在于全量图库的最新图片,可以替代原始图片保存至所述全量图库,从而实现新旧图片的更新。
步骤208,接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
步骤209,删除不符合预设时间区段的日增图库。
由于存储空间的限制,日增图库不需要保留很多天的数据,可以设置一个期限,例如保留7天,并根据该期限将过期的日增图库删除。
步骤210,通过查询商品历史访问数据确定仍在线使用的商品对应的在线图片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片。
全量库中由于存储了大量的历史数据,里面会有很多的“僵尸图片”,包括已经下线的商品对应的图片,商品中已经删除的图片等,这些数据需要清除,否则久而久之会占用很大的存储空间。通过查询商品历史访问数据可以确定仍在线使用的商品对应的在线图片,或是通过查询图片历史调用数据确定仍在线使用的在线图片,或是将两种方式结合使用。
步骤211,删除所述全量图库中除所述在线图片之外的图片。
需要将全量库中的“僵尸图片”删除,由于全量图库数据量巨大,无法一次性执行这个清理工作,因此对于全量图片,可以采用按照类目分批清理的策略。
可以设定类目ID与预设数值的求模结果,与预设时间段(一天、一周、一月或一年等)中各个日期(某天、一天中的某个时间点等)的对应关系,在到达该日期时,清理对应的类目ID。
优选地,可以查找求模结果等于当天对应星期的某个图片类目作为待清理的图片类目,即将类目ID与7求模的结果等于当天对应星期的类目,作为待清理的对象,这样保证在一周之内每个类目都会得到清理。例如,类目A的ID为9,与7求模结果为2,类目B的ID为8,与7求模结果为1,则可以将类目A作为星期二的待清理对象,将类目B作为星期一的待清理对象,若当天对应一周中的星期二,则清理类目A。还可以根据实际需求设置类目ID与7求模结果与一周中某个日期的对应关系,例如,求模结果为2,则对应在一周中的周五清理,求模结果为3,在对应在一周中的周一清理。还可以采用任意适用的方式设置清理图片的时间,本申请对此并不做限制。
相应的,所述删除所述全量图库中除所述在线图片之外的图片为,针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
为使本领域技术人员更好地理解本申请,以下通过采用Hadoop平台实施本申请的 一种海量图片管理方法为例进行说明。本申请的方案可以包括图片传输、图片存储、图库更新和数据输出几部分,以下分块进行详细说明。需要说明的是,图3-7中的图像即本申请所述的图片。
一、图片传输
如图3所示给出了本申请图片传输的流程示意图,具体过程包括:
1、获取当天修改的商品信息。
通过查询业务数据,找到当天被修改的商品,包括新发布的商品和文字修改或者图片修改的商品。由于无法精确的获得哪些商品是图片修改的商品,因此获取的商品量会比较大。
2、均衡商品信息切分。
对下载的商品信息构建相应的图像信息,首先获取待处理的商品数据,然后通过商品数据解析得到图片的URL,进一步均衡的切分为N份,调用高可靠性的传输程序将图片写入HDFS的SequenceFile中,每一份会有一个对应的图片上传单元进行处理。通过并行化上传加快图片的传输速度,多个图片上传单元并行工作。
3、传输图片到临时日增图库。
所有上传单元都将图片上传到HDFS上的一个临时日增图库中,这个临时日增图库存储了第一步中获取的全部商品的图片。传输程序具备高可靠性,通过部署传输断开重连机制、超时控制机制,保证可以在最短的时间中将大量的图片传输到HDFS。
4、索引比对,构建日增库。
索引中存储了图片库中图片的ID和图片的MD5码,通过MapReduce任务,将临时目录中的图片和索引库对比,获取不存在索引中的图片数据,作为当天的日增库内容。
5、更新索引。
对当天的日增库内容的图片数据构建索引,更新图库索引库,以便下次的图片上传中使用此索引进行过滤。
6、更新全量图库。
将当天的日增图库写入全量图库。
7、日增图库自清理。
由于存储空间的限制,日增图库不需要保留很多天的数据,一般保留7天,这一步将过期的日增图库从HDFS删除。
8、全量图库自清理。
二、图片存储
图4示出了本申请的一个示例中图片的存储结构,图5示出了本申请的一个示例中多级图片类目的示意图。
SequenceFile中通过K-V格式存储数据,这里我们将K作为图片的ID,V作为图片原始数据(二进制数据)和元数据,组成了如图4所示的存储结构。
图片的元数据包括图片的MD5码和图片对应的商品所在的类目等,在后续的图片处理过程中这些元数据可以提供数据过滤功能。
由于图片库需要提供灵活的过滤访问,比如用户可能需要访问某某类目下的哪些商品ID对应的图片,所以本图片库中不是将所有的图片都放在一起,而是按照如图5所示的目录组织形式,将图片按照类目分级存放,就像一个个的分区。如图image01.seq、image02.seq和image03.seq存储在图像库根目录下的某个四级类目下,这样当只需要过滤得到某个四级类目下的某些图片时,只需要拿四级类目的数据作为输入即可,可极大的减少数据的处理量。
三、图库更新
图片更新包括三个方面:
1、日增图库的更新
通过每日运行图片传输任务,建立当天的日增图库,并将过期的日增图库删除。
2、全量图库的更新
全量图库在本发明中是实现的日更新,将每日的日增图库直接和全量库合并即可。
3、全量图片的清理
此步骤需要将全量库中的“僵尸图片”删除,由于全量图库数据量巨大,无法一次性执行这个清理工作,因此对于全量图片,按照类目分批清理的策略,即在每天清理类目ID求模7等于当天对应星期的类目,这样保证在一周之内每个类目都会得到清理。
如图6示出了本申请的一个示例中图片清理的步骤示意图,具体包括:
步骤1、判断类目是否当天清理。
如果此类目当天清理,则加入图像库文件的清理列表中。
步骤2、准备有效图片ID列表。
通过查询业务数据,确定哪些图片ID是需要保留的,不在此列表之内的图片数据将被删除。
步骤3、运行MapReduce清理任务。
执行MapReduce任务将有效图片ID和原始图库中的图片数据进行比对,将不需要的图片清理掉。
步骤4、用清理后的数据翻盖原始数据。
使用清理之后的图片数据替换原始的图片库数据,完成清理。
四、数据输出
图片输出解决的问题是如何将满足下游图片处理程序的数据输入,图7示出了本申请的一个示例中图片输出的流程示意图,具体包括:
步骤1、确定需要的图片ID列表。
下游程序提供需要的图片ID,作为图片输出步骤的输入。
步骤2、过滤图库,得到图片数据。
根据图片列表,从图库中获取需要的图片数据。这里通过MapReduce任务进行图片ID和图库数据的分布式比对,得到结果。
步骤3、提取图片特征。
得到图片数据之后,可以通过内置的图片特征提取方法,或者下游程序自定义的图片特征提取算法,通过分布式MapReduce任务任务做特征提取,提取的特征作为下游图片处理任务的输入。
参考图8,其示出了根据本申请一个实施例的海量图片管理装置的结构框图,具体可以包括:
图片获取模块301,用于获取当日更新的多个最新图片;
图片上传模块302,用于将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
图片保存模块303,用于图片通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库;
图片反馈模块304,用于接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
本申请实施例中,优选地,所述图片保存模块包括:
索引比对子模块,用于将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
图片提取子模块,用于提取图片索引不存在于所述历史索引库的最新图片保存至所 述全量图库。
本申请实施例中,优选地,所述装置还包括:
索引增加模块,用于将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
本申请实施例中,优选地,所述全量图库中的图片按照所属多级图片类目分布存放在所述服务器集群的多个存储区,每个存储区的图片按照对应的图片编号按序存放,各图片标记有对应的图片标识和所属多级图片类目;
所述图片反馈模块包括:
类目解析子模块,用于解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
按类目提取子模块,用于根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
本申请实施例中,优选地,所述图片反馈模块,具体用于从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序;
所述图片索引为所述图片的图片编号和图片标识。
依据本申请实施例,将全量的商品图片存储于分布式服务集群的全量图库中,满足了海量图片的处理和分析对平台的存储能力、数据处理能力的要求;针对每日更新的最新图片,存储至日增图库,通过比对图片索引确定不存在于全量图库的新增图片,将确定的新增图片增加至全量图库,避免了提供给下游应用程序的商品图片不准确以及占用较多存储资源和计算资源的问题。
参考图9,其示出了根据本申请另一个实施例的海量图片管理装置的结构框图,具体可以包括:
最新商品解析模块401,用于在所述获取当日更新的多个最新图片之前,通过解析商品更新记录获得对应更新的最新商品信息;
链接地址访问模块402,用于从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
图片获取模块403,用于获取当日更新的多个最新图片;
图片上传模块404,用于将所述最新图片通过多个传输线程并行上传到分布式服务 器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
超时处理模块405,用于检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务;
网络连接中断处理模块406,用于监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所有传输线程,并重启新的多个传输线程代替执行相应任务。
图片保存模块407,用于图片通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库,将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库;
图片反馈模块408,用于接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
图库删除模块409,用于删除不符合预设时间区段的日增图库。
查询模块410,用于通过查询商品历史访问数据确定仍在线使用的商品对应的在线图片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片;
图片删除模块411,用于删除所述全量图库中除所述在线图片之外的图片。
本申请实施例中,优选地,类目查找模块,用于查找求模值等于当天对应星期的某个图片类目作为待清理的图片类目;
所述图片删除模块,具体用于针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
依据本申请实施例,将全量的商品图片存储于分布式服务集群的全量图库中,满足了海量图片的处理和分析对平台的存储能力、数据处理能力的要求;针对每日更新的最新图片,存储至日增图库,通过比对图片索引确定不存在于全量图库的新增图片,将确定的新增图片增加至全量图库,避免了提供给下游应用程序的商品图片不准确以及占用较多存储资源和计算资源的问题。
本申请实施例中,针对对应的原始图片存在于全量图库的最新图片,可以替代原始图片保存至所述全量图库,从而实现新旧图片的更新;在提取应用程序所需最新图片后,可以进一步提取图片特征进行反馈,减轻了应用程序所在终端处理图片的负载。
本申请实施例支持将图片按照对应的多级图片类目存放在服务器集群的多个存储区,进一步查找图片时可以仅仅根据多级类目进行提取,从而可以极大的提高查找数据的效率;并且,在各个存储区,可以将多个图片按照图片编号组织成一个大文件进行存 储,从而提高了图片查找和处理的效率。
由于所述装置和系统实施例基本相应于前述所示的方法实施例,故本实施例的描述中未详尽之处,可以参见前述实施例中的相关说明,在此就不赘述了。
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个申请方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,申请方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例 的任意之一都可以以任意的组合方式来使用。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的基于数据分析的服务器入侵识别设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。

Claims (22)

  1. 一种海量图片管理方法,其特征在于,包括:
    获取当日更新的多个最新图片;
    将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
    通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库;
    接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
  2. 如权利要求1所述的方法,其特征在于,在所述获取当日更新的多个最新图片之前,所述方法还包括:
    通过解析商品更新记录获得对应更新的最新商品信息;
    从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
  3. 如权利要求1所述的方法,其特征在于,所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库包括:
    将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
    提取图片索引不存在于所述历史索引库的最新图片保存至所述全量图库。
  4. 如权利要求3所述的方法,其特征在于,所述方法还包括:
    将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
  5. 如权利要求1所述的方法,其特征在于,所述全量图库中的图片按照所属多级图片类目分布存放在所述服务器集群的多个存储区,每个存储区的图片按照对应的图片编号按序存放,各图片标记有对应的图片标识和所属多级图片类目;
    所述接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序包括:
    解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
    根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
  6. 如权利要求1所述的方法,其特征在于,每日对应一个日增图库,所述方法还 包括:
    删除不符合预设时间区段的日增图库。
  7. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    通过查询商品历史访问数据确定仍在线使用的商品对应的在线图片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片;
    删除所述全量图库中除所述在线图片之外的图片。
  8. 如权利要求7所述的方法,其特征在于,所述方法还包括:
    查找求模值等于当天对应星期的某个图片类目作为待清理的图片类目;
    所述删除所述全量图库中除所述在线图片之外的图片为,针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
  9. 如权利要求1所述的方法,其特征在于,在所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库的同时,所述方法还包括:
    将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库。
  10. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务;
    和/或,监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所有传输线程,并重启新的多个传输线程代替执行相应任务。
  11. 如权利要求1所述的方法,其特征在于,所述从所述全量图库提取目标图片反馈至所述应用程序为,从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序;
    所述图片索引为所述图片的图片编号和图片标识。
  12. 一种海量图片管理装置,其特征在于,包括:
    图片获取模块,用于获取当日更新的多个最新图片;
    图片上传模块,用于将所述最新图片通过多个传输线程并行上传到分布式服务器集群中预置的日增图库,所述分布式服务器集群中还部署有全量图库;
    图片保存模块,用于图片通过比对图片索引,将所述日增图库中不存在于所述全量 图库的最新图片保存至所述全量图库;
    图片反馈模块,用于接收到应用程序调用图片的请求后,从所述全量图库提取目标图片反馈至所述应用程序。
  13. 如权利要求12所述的装置,其特征在于,所述装置还包括:
    最新商品解析模块,用于在所述获取当日更新的多个最新图片之前,通过解析商品更新记录获得对应更新的最新商品信息;
    链接地址访问模块,用于从所述最新商品信息解析出所述最新图片的链接地址,根据所述链接地址获取所述最新图片。
  14. 如权利要求11所述的装置,其特征在于,所述图片保存模块包括:
    索引比对子模块,用于将所述日增图库中最新图片的图片索引与预置的历史索引库进行比对,所述历史索引库中保存所述全量图库中所有图片的图片索引;
    图片提取子模块,用于提取图片索引不存在于所述历史索引库的最新图片保存至所述全量图库。
  15. 如权利要求14所述的装置,其特征在于,所述装置还包括:
    索引增加模块,用于将增加至所述全量图库的最新图片对应的图片索引增加至所述历史索引库。
  16. 如权利要求12所述的装置,其特征在于,所述全量图库中的图片按照所属多级图片类目分布存放在所述服务器集群的多个存储区,每个存储区的图片按照对应的图片编号按序存放,各图片标记有对应的图片标识和所属多级图片类目;
    所述图片反馈模块包括:
    类目解析子模块,用于解析所述调用图片的请求携带所需目标图片的目标多级图片类目;
    按类目提取子模块,用于根据所述多级图片类目中各级图片类目对应在所述存储区的存放位置以及各个图片标记的图片标识和所属多级图片类目,从所述全量图库中提取所述目标图片。
  17. 如权利要求12所述的装置,其特征在于,每日对应一个日增图库,所述装置还包括:
    图库删除模块,用于删除不符合预设时间区段的日增图库。
  18. 如权利要求12所述的装置,其特征在于,所述装置还包括:
    查询模块,用于通过查询商品历史访问数据确定仍在线使用的商品对应的在线图 片,和/或,通过查询图片历史调用数据确定仍在线使用的在线图片;
    图片删除模块,用于删除所述全量图库中除所述在线图片之外的图片。
  19. 如权利要求18所述的装置,其特征在于,所述装置还包括:
    类目查找模块,用于查找求模值等于当天对应星期的某个图片类目作为待清理的图片类目;
    所述图片删除模块,具体用于针对所述待清理的图片类目,在所述全量图库中删除该图片类目下除所述在线图片之外的图片。
  20. 如权利要求12所述的装置,其特征在于,所述装置还包括:
    图片替代模块,用于在所述通过比对图片索引,将所述日增图库中不存在于所述全量图库的最新图片保存至所述全量图库的同时,将对应的原始图片存在于所述全量图库的最新图片替代所述原始图片保存至所述全量图库。
  21. 如权利要求12所述的装置,其特征在于,所述装置还包括:
    超时处理模块,用于检测到某个传输线程的执行时间超出预设时间时,结束所述传输线程,并重启新的传输线程代替执行相应任务;
    和/或,网络连接中断处理模块,用于监控网络连接API,当捕获到所述网络连接API发出网络连接异常通知时,结束所有传输线程,并重启新的多个传输线程代替执行相应任务。
  22. 如权利要求12所述的装置,其特征在于,所述图片反馈模块,具体用于从所述全量图库查找所述目标图片,提取所述目标图片的图片特征反馈至所述应用程序;
    所述图片索引为所述图片的图片编号和图片标识。
PCT/CN2016/106326 2015-11-27 2016-11-18 一种海量图片管理方法和装置 WO2017088701A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510849675.8A CN106815223B (zh) 2015-11-27 2015-11-27 一种海量图片管理方法和装置
CN201510849675.8 2015-11-27

Publications (1)

Publication Number Publication Date
WO2017088701A1 true WO2017088701A1 (zh) 2017-06-01

Family

ID=58763034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106326 WO2017088701A1 (zh) 2015-11-27 2016-11-18 一种海量图片管理方法和装置

Country Status (2)

Country Link
CN (1) CN106815223B (zh)
WO (1) WO2017088701A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558503A (zh) * 2018-11-16 2019-04-02 努比亚技术有限公司 表情包显示方法、移动终端及计算机可读存储介质
WO2020134990A1 (zh) * 2018-12-29 2020-07-02 益萃网络科技(中国)有限公司 产品信息的查询方法、装置、计算机设备及存储介质
CN113010812A (zh) * 2021-03-10 2021-06-22 北京百度网讯科技有限公司 信息采集方法、装置、电子设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932343B (zh) * 2018-07-24 2020-03-27 南京甄视智能科技有限公司 人脸图像数据库的数据集清洗方法与系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510217A (zh) * 2009-03-09 2009-08-19 阿里巴巴集团控股有限公司 图像数据库中的图像更新方法、服务器及系统
CN101556584A (zh) * 2008-04-10 2009-10-14 深圳市万水千山网络发展有限公司 一种实现图片交易的计算机系统及方法
CN102122389A (zh) * 2010-01-12 2011-07-13 阿里巴巴集团控股有限公司 一种图像相似性判断的方法及装置
CN104219270A (zh) * 2013-06-05 2014-12-17 北京齐尔布莱特科技有限公司 多张图片从客户端快速高效上传至服务器的方法
US9069885B1 (en) * 2010-11-26 2015-06-30 CodeGuard, Inc. Systems and methods for automated retrieval, monitoring, and storage of online content

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101298647B1 (ko) * 2007-05-25 2013-08-21 삼성전자주식회사 디지털 영상 처리 장치에서 D-Day 설정 및 관리 방법
CN102436491A (zh) * 2011-11-08 2012-05-02 张三明 一种基于BigBase的海量图片搜索系统及方法
CN102622291A (zh) * 2012-03-13 2012-08-01 苏州阔地网络科技有限公司 一种进程的监控方法及系统
CN103457973B (zh) * 2012-06-01 2016-04-27 深圳市腾讯计算机系统有限公司 一种图片上传方法、系统、图片上传客户端及网络服务器
CN103049491A (zh) * 2012-12-07 2013-04-17 深圳市同洲电子股份有限公司 一种图片文件的管理方法及装置
CN103970516B (zh) * 2013-01-30 2015-10-07 腾讯科技(深圳)有限公司 冗余图片删除方法及装置
CN104199899A (zh) * 2014-08-26 2014-12-10 浪潮(北京)电子信息产业有限公司 一种基于Hbase的海量图片存储方法及装置
CN104317805A (zh) * 2014-09-23 2015-01-28 广州金山网络科技有限公司 更新弹窗图片库的方法、弹窗图片库更新装置及系统
CN104298747A (zh) * 2014-10-13 2015-01-21 福建星海通信科技有限公司 大数据量图片的存储方法、以及检索方法
CN104750811A (zh) * 2015-03-30 2015-07-01 浪潮通信信息系统有限公司 一种移动通信数据文件多线程实时采集方法
CN104881296A (zh) * 2015-06-17 2015-09-02 北京奇虎科技有限公司 基于iOS系统的图片删除方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556584A (zh) * 2008-04-10 2009-10-14 深圳市万水千山网络发展有限公司 一种实现图片交易的计算机系统及方法
CN101510217A (zh) * 2009-03-09 2009-08-19 阿里巴巴集团控股有限公司 图像数据库中的图像更新方法、服务器及系统
CN102122389A (zh) * 2010-01-12 2011-07-13 阿里巴巴集团控股有限公司 一种图像相似性判断的方法及装置
US9069885B1 (en) * 2010-11-26 2015-06-30 CodeGuard, Inc. Systems and methods for automated retrieval, monitoring, and storage of online content
CN104219270A (zh) * 2013-06-05 2014-12-17 北京齐尔布莱特科技有限公司 多张图片从客户端快速高效上传至服务器的方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558503A (zh) * 2018-11-16 2019-04-02 努比亚技术有限公司 表情包显示方法、移动终端及计算机可读存储介质
CN109558503B (zh) * 2018-11-16 2024-05-10 努比亚技术有限公司 表情包显示方法、移动终端及计算机可读存储介质
WO2020134990A1 (zh) * 2018-12-29 2020-07-02 益萃网络科技(中国)有限公司 产品信息的查询方法、装置、计算机设备及存储介质
CN113010812A (zh) * 2021-03-10 2021-06-22 北京百度网讯科技有限公司 信息采集方法、装置、电子设备和存储介质
CN113010812B (zh) * 2021-03-10 2023-07-25 北京百度网讯科技有限公司 信息采集方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN106815223B (zh) 2020-10-27
CN106815223A (zh) 2017-06-09

Similar Documents

Publication Publication Date Title
US11860874B2 (en) Multi-partitioning data for combination operations
US11151137B2 (en) Multi-partition operation in combination operations
US11474673B1 (en) Handling modifications in programming of an iterative message processing system
US11334543B1 (en) Scalable bucket merging for a data intake and query system
US10761813B1 (en) Assisted visual programming for iterative publish-subscribe message processing system
US20230169086A1 (en) Event driven extract, transform, load (etl) processing
US10776441B1 (en) Visual programming for iterative publish-subscribe message processing system
US11409756B1 (en) Creating and communicating data analyses using data visualization pipelines
CN108053863B (zh) 适合大小文件的海量医疗数据存储系统及数据存储方法
US11567993B1 (en) Copying buckets from a remote shared storage system to memory associated with a search node for query execution
CN113254466B (zh) 一种数据处理方法、装置、电子设备和存储介质
US11574242B1 (en) Guided workflows for machine learning-based data analyses
WO2017088701A1 (zh) 一种海量图片管理方法和装置
US12014255B1 (en) Generating machine learning-based outlier detection models using timestamped event data
US11573971B1 (en) Search and data analysis collaboration system
US11450419B1 (en) Medication security and healthcare privacy systems
US11922222B1 (en) Generating a modified component for a data intake and query system using an isolated execution environment image
US10698863B2 (en) Method and apparatus for clearing data in cloud storage system
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
US11687487B1 (en) Text files updates to an active processing pipeline
US11630744B2 (en) Methods and systems relating to network based storage retention
US20210232603A1 (en) Capturing data lake changes
CN112084190A (zh) 一种基于大数据的采集数据实时存储与管理系统和方法
WO2022261249A1 (en) Distributed task assignment, distributed alerts and supression management, and artifact life tracking storage in a cluster computing system
US11714698B1 (en) System and method for machine-learning based alert prioritization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16867924

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16867924

Country of ref document: EP

Kind code of ref document: A1