WO2022083332A1 - Commodity data management method and apparatus, and server - Google Patents

Commodity data management method and apparatus, and server Download PDF

Info

Publication number
WO2022083332A1
WO2022083332A1 PCT/CN2021/116999 CN2021116999W WO2022083332A1 WO 2022083332 A1 WO2022083332 A1 WO 2022083332A1 CN 2021116999 W CN2021116999 W CN 2021116999W WO 2022083332 A1 WO2022083332 A1 WO 2022083332A1
Authority
WO
WIPO (PCT)
Prior art keywords
commodity
data
sub
server
product
Prior art date
Application number
PCT/CN2021/116999
Other languages
French (fr)
Chinese (zh)
Inventor
常亚
王刚
胡小清
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022083332A1 publication Critical patent/WO2022083332A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2428Query predicate definition using graphical user interfaces, including menus and forms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Definitions

  • the present application belongs to the technical field of data management, and in particular relates to a commodity data management method, device and server.
  • E-commerce has become a mainstream means of mass shopping.
  • the CP can be a merchant or someone other than the merchant.)
  • the existing commodity management system requires the CP to provide structured commodity data.
  • the commodity management system stores these structured commodity data and provides commodity search services.
  • this method requires the CP to have a certain data operation capability, which makes the operation threshold for this process relatively high.
  • the quantity of goods tends to be larger.
  • the CP needs to spend a lot of manpower and material resources to operate, so the workload of commodity data provision is often large.
  • the prior art has low management efficiency for commodity data, and the threshold for CP operation is relatively high, which is not conducive to the effective management of commodity data.
  • the embodiments of the present application provide a commodity data management method, device, and server, which can solve the problem of low efficiency in commodity data management in the prior art.
  • a first aspect of the embodiments of the present application provides a commodity data management method, applied to a server, including:
  • the commodity data is acquired, and the commodity data is divided into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
  • the CP only needs to provide the commodity data according to the format requirements, and then the offline import of the commodity data into the database can be realized. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work.
  • the commodity management system splits the commodity data to obtain multiple sub-files (ie, the first sub-file). Attribute data verification will be performed on each sub-file, and the attribute data that has passed the verification will be uploaded.
  • the verification of each sub-file by the server may be serial processing or parallel processing. During parallel processing, the server can perform verification operations on multiple sub-files at the same time, thereby improving verification efficiency.
  • the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability.
  • the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data.
  • attribute data verification is performed on each of the first sub-files, and the verified attribute data is stored in a database, including:
  • One sub-file is selected from the at least one first sub-file as the second sub-file.
  • the operation of selecting a sub-file from the at least one first sub-file as the second sub-file is returned to execute until all the first sub-files are verified.
  • the server will cyclically select and process each subfile (ie, the second subfile) from these first subfiles, so as to perform data verification on each first subtask, and synchronously convert the data in the subfiles
  • the commodity data is stored in the database. This makes warehousing more efficient and realizes efficient management of commodity data.
  • attribute data verification is performed on the second sub-file, and the verified data in the second sub-file is verified.
  • attribute data is uploaded to the database, it also includes:
  • a first subtask corresponding to the at least one first subfile one-to-one is created in the database.
  • a second subtask is determined from the first subtasks stored in the database, and a second subfile associated with the second subtask is acquired from at least one first subfile.
  • the database can effectively record the verification status of each sub-file, and at the same time, the server can also conveniently determine the sub-file to be verified each time.
  • a second subtask is determined from the first subtask stored in the database, including:
  • the subtasks to be executed in the first subtask are acquired, and the subtasks to be executed include the first subtask that is not executed, and the first subtask that is being executed and whose execution duration exceeds the duration threshold.
  • a second subtask is determined from the subtasks to be executed.
  • whether the subtask is a subtask to be executed is determined based on the execution state of the subtask.
  • unexecuted subtasks need to be processed by the server.
  • the server cannot process subtasks normally, for example, the server cannot process subtasks normally due to reasons such as downtime.
  • the subtask is being executed, it cannot be completed. Even if you continue to wait for the server, the processing of the subtasks cannot be completed, and the verification of the subfiles cannot be realized. Therefore these subtasks need to be reprocessed by other servers. Based on the above two considerations.
  • unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed.
  • the server will obtain all real-time subtasks to be executed, and determine the second subtask to be executed from them.
  • a second subtask is determined from the subtasks to be executed, including:
  • Distributed locks for each to-be-executed subtask are sequentially requested from the cache component.
  • the subtask to be executed is regarded as the second subtask.
  • multiple servers may be used to process each subtask at the same time.
  • the server that is the execution body of each solution in the first aspect is also a server that processes subtasks.
  • multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced.
  • the server after receiving the subtask to be executed, the server first selects a subtask from it, and tries to apply for the subtask to the cache component. Distributed lock. Since the distributed lock of a single subtask can only be assigned to a single server.
  • the distributed lock for the subtask can be obtained at this time.
  • the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time.
  • the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. Conversely, if the acquisition of the distributed lock fails, the operation of subtask selection will be re-executed to reselect an appropriate subtask.
  • the process of performing attribute data verification on the second sub-file further includes:
  • the distributed lock on the second subfile is released.
  • the subtask In order to prevent the failure of the server, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask.
  • the server will count the verification duration of subtasks by itself, and determine whether the duration threshold is reached. If it is reached, it means that the verification of the subtasks by itself has timed out, and it may be that it is faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
  • the sixth possible implementation of the first aspect if there is attribute data that fails to be verified in the commodity data, the attribute of the failed verification is acquired.
  • the abnormal information of the data, and the abnormal information is stored in the database.
  • the parsed attribute data is checked for validity. That is, it is determined whether each attribute data has problems such as missing data or data errors. When these problems exist, it means that the verification of these attribute data fails.
  • the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data. On this basis, the abnormal information can be fed back to the CP, or the CP can query it by itself. Thus, the CP can quickly or which data has problems, and can perform targeted inspection and restocking. Thus, the efficiency of warehousing the commodity data is improved.
  • the commodity data is data in a data table format.
  • the format of the commodity data is set as a data table format. Since the data table format is a common data recording format, it is the format that many CPs use when they organize commodity data on a daily basis. Therefore, for the CP, if the commodity data is required to be provided in a data table format, the CP only needs to simply organize the original commodity data to obtain the commodity data required by the commodity management system. As a result, the technical threshold and workload requirements for CP are greatly reduced, thereby improving the efficiency of commodity data management.
  • uploading the attribute data in the second sub-file to the database includes:
  • the attribute data that has passed the verification in the second sub-file is uploaded to the database.
  • the attribute data that has passed the verification in the second sub-file is uploaded to the database.
  • Option 1 Validate a single sub-file while checking the product attribute data into the warehouse, and each time the attribute data of a single product is used as the object for verification and storage. (corresponding to the embodiment shown in FIG. 2A )
  • Option 2 Only after all the attribute data of a single sub-file is verified can the commodity attribute data be put into storage. (corresponding to the embodiment shown in FIG. 3A )
  • the operation granularity of scheme 1 is the single item level, while the operation granularity of scheme 2 is the single sub-file level.
  • the server needs to interact with the database multiple times. It needs to consume more network resources, and has higher requirements on the quality of the network connection between the server and the database.
  • the attribute data or abnormal information of each commodity in the sub-file is also stored in the database synchronously. If the server is abnormal, the database can also record all the verified commodity attribute data in the current sub-file before the server is abnormal. on the basis of.
  • the other servers re-verify the sub-file, they can choose to start the verification from the beginning, or they can choose to continue to verify the commodity attribute data in the sub-file that has not yet been put into storage. Therefore, the fault tolerance mechanism of Scheme 1 is more complete and the fault tolerance rate is high.
  • the attribute data in the commodity data includes a commodity image download address
  • the method further includes:
  • image feature analysis of the in-warehouse commodity is performed and stored in the feature database for use in subsequent user commodity search. Therefore, the embodiments of the present application can provide data support for subsequent commodity searches.
  • image feature analysis is performed on a commodity picture to obtain image feature data, including:
  • the product image may contain multiple objects. Therefore, if the feature analysis is performed directly on the product image, the obtained image feature data also contains other objects, which is not conducive to subsequent image matching. Therefore, in this embodiment of the present application, commodity detection is performed on commodity pictures before image feature analysis. The commodity image is then intercepted and analyzed according to the detection result, so that the commodity characteristic data extracted in the embodiment of the present application is more consistent with the commodity itself, and the data is more accurate and reliable. This further improves the accuracy and reliability of subsequent product searches.
  • a second aspect of the embodiments of the present application provides a commodity search method, which is applied to a server, and the method includes:
  • first image feature data of a first commodity picture where the first commodity picture is a picture uploaded by a user terminal.
  • At least one second image feature data with the highest feature matching degree with the first image feature data is determined.
  • an accurate and fast search for the commodities already in the warehouse can be realized.
  • the retrieved product data is reordered according to the trademark information in the product image and then fed back to the user, so that the product with a high similarity to the product to be retrieved by the user can be used for attribute data and product images in the user terminal. Show priority. Improve the accuracy and relevance of search results.
  • the second product image that is one-to-one corresponding to the at least one second image feature data
  • the second product image corresponding to the second image feature data Before the attribute data associated with the product image is sent to the user terminal, it also includes:
  • Target trademark information of each target product is acquired, where the target product is the product associated with the second image feature data, and the second product image and associated attribute data are the product image and attribute data of the target product.
  • the trademark information in the commodity pictures uploaded by the user is matched with the trademark information of each target commodity. Then sort them in order of matching degree. Thereby, the reordering of target commodities based on trademark information is realized.
  • the target trademark information includes: the second trademark information and/or the third trademark information.
  • the second trademark information is the trademark information contained in the second product image associated with the target product.
  • the third brand information is brand information contained in attribute data associated with the target product.
  • the trademark information of the target commodity may be the trademark information contained in the commodity picture thereof, or may be the trademark information recorded in the attribute data thereof. It is also possible to include both. Therefore, the embodiment of the present application can adapt to various actual situations to obtain the trademark information of the target commodity, so as to ensure the reliability of the matching of the trademark information. In addition, when both are included at the same time, the probability of obtaining the trademark information of the target product can be improved.
  • image feature analysis is performed on the first commodity picture to obtain the first image feature data, including:
  • the image feature analysis model is a model extracted from a neural network model trained on commodity image samples and attribute data samples based on multiple commodity samples.
  • image feature analysis is performed by using an image feature analysis model obtained after training based on data in two dimensions of commodity pictures and attribute data, which can improve the accuracy of feature analysis. Improve the reliability of subsequent feature matching.
  • the image feature analysis model includes:
  • Feature extraction is performed on commodity information as sample data by using the initial model, and a first loss function for commodity information is calculated according to the extracted text features and corresponding classification labels.
  • the initial model is used to extract the image features of the product images as sample data, and the second loss function for the product images is calculated according to the image features and the corresponding classification labels.
  • a third loss function is calculated based on the first loss function and the second loss function, and the initial model is iteratively updated according to the calculated value of the third loss function until a preset convergence condition is satisfied, and a trained model is obtained.
  • each network used for feature extraction of commodity images is extracted, and an image feature analysis model composed of these extracted networks is obtained.
  • the training method of the classification model is used to separately process the commodity pictures and commodity information of the sample commodities.
  • the model training of multi-modal fusion is performed. That is, the loss function values of the two dimensions are fused through a new loss function, and the model is iteratively updated based on the loss function value obtained by fusion.
  • each network used for feature extraction of product images is extracted (ie, the network that discards the feature extraction part of product information) to form a new model for image feature analysis.
  • the image feature analysis model trained based on this method can achieve more accurate and reliable extraction of product image features, and the obtained image feature data has a better characterization effect on product images.
  • the image feature data extracted based on this image feature analysis model has a high accuracy rate when performing product image matching.
  • a third aspect of the embodiments of the present application provides a commodity data management system, including: a first server, a second server, and a database.
  • the first server is used for acquiring commodity data, and dividing the commodity data into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
  • the second server is configured to perform attribute data verification on each of the first sub-files, and store the verified attribute data in the database.
  • the CP only needs to provide the commodity data according to the format requirements, and then the offline import of the commodity data into the database can be realized. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work.
  • the commodity management system will perform data splitting on the commodity data to obtain multiple sub-files (namely, the first sub-file). Attribute data verification will be performed on each sub-file, and the attribute data that has passed the verification will be uploaded. Wherein, the verification of each sub-file by the second server may be serial processing or parallel processing. During parallel processing, the server can perform verification operations on multiple sub-files at the same time, thereby improving verification efficiency.
  • the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability.
  • the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data.
  • the first server refers to the execution subject server in S102-S1032.
  • the second server is the execution subject server in S104-S109.
  • attribute data verification is performed on each of the first sub-files, and the verified attribute data is stored in a database, specifically including:
  • the second server selects one sub-file from the at least one first sub-file as the second sub-file.
  • the second server performs attribute data verification on the second sub-file, and uploads the verified attribute data in the second sub-file to the database.
  • the second server After completing the verification of the second sub-file, the second server returns to perform the operation of acquiring one sub-file in the at least one first sub-file until all the first sub-files are verified.
  • the second server will cyclically select and process each sub-file (ie, the second sub-file) from these sub-files, so as to perform data verification on each sub-task, and synchronize the commodities in the sub-files.
  • Data is stored in the database. It realizes the automatic verification and data storage of commodity data, and greatly improves the management efficiency of commodity data.
  • the second server may refer to a specific server, or may be any server in a server cluster including multiple servers.
  • the second server is any server in the server cluster.
  • multiple servers can synchronously perform processing and verification of sub-files. Compared with a single server, the embodiment of the present application can greatly improve the verification speed and reliability of sub-files. Therefore, the efficiency of warehousing the commodity data can be improved.
  • the second possible implementation manner of the third aspect before selecting a subfile from the at least one first subfile as the second subfile, further include:
  • the first server is further configured to create a first subtask corresponding to the first subfile one-to-one in the database.
  • the second server selects one subfile from the at least one first subfile as the second subfile, including:
  • the second server determines a second subtask from the first subtasks stored in the database, and acquires a subfile associated with the second subtask in at least one first subfile.
  • the second server returns to perform the operation of acquiring one sub-file in the at least one first sub-file until all the first sub-files are verified, including:
  • the second server returns to perform the operation of determining a second subtask from the first subtasks stored in the database until all the first subtasks are executed and completed.
  • a subtask ie, the first subtask
  • the form of the subtask to be executed ie the second subtask
  • the database can effectively record the verification status of each sub-file
  • the second server can also conveniently determine the sub-file to be verified each time.
  • the second server is any server in the server cluster, by creating subtasks in the database, it can greatly facilitate the acquisition and verification of subfiles by each server in the server cluster. Further, the processing efficiency of the sub-files is improved.
  • an operation of a second subtask is determined from the first subtask stored in the database, including:
  • the second server sends a task query request to the database.
  • the database selects the subtasks to be executed from the first subtasks, and sends the subtasks to be executed to the second server, where the subtasks to be executed include the unexecuted first subtasks , and the first subtask that is being executed and whose execution duration exceeds the duration threshold.
  • the second server determines the second subtask from the received subtasks to be executed.
  • whether the subtask is a subtask to be executed is determined based on the execution state of the subtask.
  • unexecuted subtasks need to be processed by the server.
  • the server cannot process subtasks normally, for example, the server cannot process subtasks normally due to reasons such as downtime.
  • the subtask is being executed, it cannot be completed. Even if you continue to wait for the server, the processing of the subtasks cannot be completed, and the verification of the subfiles cannot be realized. Therefore these subtasks need to be reprocessed by other servers. Based on the above two considerations.
  • unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed.
  • the server obtains all real-time subtasks to be executed, and determines the second subtask to be executed from them.
  • the second server determines the operation of the second subtask from the received subtasks to be executed, include:
  • the second server sequentially requests the cache component for distributed locks for each subtask to be executed.
  • the subtask to be executed is regarded as the second subtask.
  • multiple servers may be used to process each subtask at the same time.
  • the server that is the execution body of each solution in the first aspect is also a server that processes subtasks.
  • multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced.
  • the server after receiving the subtask to be executed, the server first selects a subtask from it, and tries to apply for the subtask to the cache component. Distributed lock. Since the distributed lock of a single subtask can only be assigned to a single server.
  • the distributed lock for the subtask can theoretically be obtained at this time.
  • the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time.
  • the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. Conversely, if the acquisition of the distributed lock fails, the subtask selection operation will be re-executed to reselect the appropriate subtask.
  • the second server is further configured to:
  • the distributed lock on the second subfile is released.
  • the subtask In order to prevent the failure of the server, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask.
  • the server will count the verification duration of subtasks by itself, and determine whether the duration threshold is reached. If it is reached, it means that the verification of the subtasks by itself has timed out, and it may be that it is faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
  • the commodity data management system further includes: a cache component.
  • the cache component is configured to start timing after allocating the distributed lock on the second sub-file to the second server.
  • the cache component is further configured to release the distributed lock on the second subfile when the timing duration reaches the duration threshold.
  • the subtasks are occupied by the server for a long time, which reduces the execution efficiency of the subtasks.
  • the cache component While allocating the distributed lock, the cache component will also time the distributed lock and determine whether the duration threshold is reached. If it is reached, it means that the server has timed out when verifying the subtask, and the server may be faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
  • the second server obtains the verification Exception information of the failed attribute data, and store the exception information to the database.
  • the parsed attribute data is checked for validity. That is, it is determined whether each attribute data has problems such as missing data or data errors. When these problems exist, it means that the verification of these attribute data fails.
  • the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data. On this basis, the abnormal information can be fed back to the CP, or the CP can query it by itself. Thus, the CP can quickly or which data has problems, and can perform targeted inspection and restocking. Thus, the efficiency of warehousing the commodity data is improved.
  • the commodity data is data in a data table format.
  • the format of the commodity data is set as a data table format. Since the data table format is a common data recording format, it is the format that many CPs use when they organize commodity data on a daily basis. Therefore, for the CP, if the commodity data is required to be provided in a data table format, the CP only needs to simply organize the original commodity data to obtain the commodity data required by the commodity management system. As a result, the technical threshold and workload requirements for CP are greatly reduced, thereby improving the efficiency of commodity data management.
  • attribute data verification is performed on the second sub-file, and the second sub-file is corrected
  • the verified attribute data is uploaded to the database, including:
  • the second server uploads the verified attribute data in the second sub-file to the database.
  • the second server uploads the attribute data that has passed the verification in the second sub-file to the database.
  • the attribute data in the commodity data includes the download address of the commodity image
  • the method further includes:
  • the second server downloads the image of the product according to the download address of the image of the product contained in the attribute data that has passed the verification.
  • the second server performs image feature analysis on the product image to obtain image feature data.
  • the second server stores the image feature data in the feature library.
  • image feature analysis of the in-warehouse commodity is performed and stored in the feature database for use in subsequent user commodity search. Therefore, the embodiments of the present application can provide data support for subsequent commodity searches.
  • a fourth aspect of the embodiments of the present application provides a commodity data management device, including:
  • the commodity data acquisition module is used for acquiring commodity data, and dividing the commodity data into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
  • the storage module is used to perform attribute data verification on each first sub-file, and store the verified attribute data in the database.
  • the library module includes:
  • the file selection module is used for selecting a sub-file from the at least one first sub-file as the second sub-file.
  • the data verification module is used for performing attribute data verification on the second sub-file, and uploading the verified attribute data in the second sub-file to the database.
  • the operation of selecting one sub-file from the at least one first sub-file as the second sub-file is returned to execute until all the first sub-files are verified.
  • the loop module is used to obtain a second sub-file, where the second sub-file is a sub-file selected from at least one first sub-file.
  • a fifth aspect of the embodiments of the present application provides a commodity search device, including:
  • the picture receiving module is used for receiving the first commodity picture uploaded by the user terminal.
  • the image analysis module is configured to perform image feature analysis on the first commodity picture to obtain first image feature data.
  • the feature matching module is configured to determine at least one second image feature data with the highest feature matching degree with the first image feature data from the image feature data stored in the feature library.
  • a commodity search module configured to send a second commodity picture corresponding to at least one second image feature data one-to-one and attribute data associated with the second commodity picture to the user terminal, wherein the sent second commodity picture and the associated
  • the attribute data is the second product image and attribute data sorted based on the trademark information contained in the first product image.
  • a sixth aspect of the embodiments of the present application provides a server, the server includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the computer program At the time, the server is made to implement the steps of the commodity data management method according to any one of the above-mentioned first aspects. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
  • a seventh aspect of the embodiments of the present application provides a computer-readable storage medium, including: a computer program is stored, and when the computer program is executed by a processor, the server implements the commodity according to any one of the foregoing first aspects. Steps of a data management method. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
  • An eighth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product runs on a server, causes the server to execute the commodity data management method according to any one of the first aspects above.
  • the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
  • a ninth aspect of the embodiments of the present application provides a chip system, the chip system includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement any of the foregoing first aspects.
  • the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
  • the chip system may be a single chip or a chip module composed of multiple chips.
  • FIG. 1A is a schematic diagram of a commodity search interface provided by an embodiment of the present application.
  • 1B is a schematic diagram of a commodity data upload interface provided by an embodiment of the present application.
  • FIG. 2A is a system interaction diagram of a commodity management system provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 2C is a schematic flowchart of applying for a subtask distributed lock in the commodity data management method provided by the embodiment of the present application;
  • 2D is a schematic flowchart of sub-file verification in the commodity data management method provided by the embodiment of the present application.
  • 2E is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • 3A is a system interaction diagram of a commodity management system provided by an embodiment of the present application.
  • 3B is a commodity management service architecture diagram of a commodity management system provided by an embodiment of the present application.
  • 3C is an interaction diagram of a commodity management service scenario of a commodity management system provided by an embodiment of the present application.
  • 4A is a schematic flowchart of commodity detection in the commodity data management method provided by the embodiment of the present application.
  • 4B is a schematic flowchart of image search in the commodity data management method provided by the embodiment of the present application.
  • 4C is a schematic diagram of a logical architecture of commodity search provided by an embodiment of the present application.
  • 4D is a schematic diagram of a logical architecture of commodity search provided by an embodiment of the present application.
  • 5A is a system interaction diagram when performing text search in a commodity management system provided by an embodiment of the present application.
  • 5B is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • 5C is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 5D is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • 5E is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 6 is a system interaction diagram during image search in a commodity management system provided by an embodiment of the present application.
  • Scenario 1 For merchants, they will list products on some e-commerce platforms to achieve product exposure and sales. However, in practical applications, there are a large number of e-commerce platforms, and each e-commerce platform often contains a large number of commodities. As a result, the exposure of merchants' products is often low, and the probability of being known or purchased by users is low.
  • Scenario 2 For e-commerce platforms and sales websites, a single e-commerce platform may contain a large number of merchants and products listed by merchants. E-commerce platforms can push or display products to users through various recommendation or sorting algorithms. However, the number of these pushed or displayed products often accounts for a very low proportion of the total number of products on the shelves. For example, there may be hundreds of millions of products listed on a single mainstream e-commerce platform, but the number of products pushed or displayed to users may only be thousands. At this time, most of the commodities are difficult to be known or purchased by users. Therefore, the exposure of e-commerce platforms to commodities is low, which is not conducive to the development of e-commerce platforms.
  • FIG. 1A is a schematic diagram of a product search interface.
  • the user can input the product name in the interface shown in FIG. 1A and search for the product according to the requirements.
  • the premise of commodity search is that the content provider (Content Provider, CP) needs to provide commodity data to the commodity management system, and store the commodity data in the database (also known as warehousing), so that the commodity management system can use these commodity data to provide commodity data.
  • the user provides a product search service.
  • FIG. 1B is a schematic diagram of a product data uploading interface.
  • the CP can organize the commodity data according to its own needs and the requirements of the commodity management system, and then upload the commodity data to the database of the commodity management system.
  • the embodiments of the present application achieve the effects of automatic verification and storage of commodity data.
  • the CP needs to organize the structured commodity data by itself and upload it to the storage bucket of the commodity management system.
  • the embodiment of the present application greatly reduces the technical threshold of CP commodity data operation, and realizes efficient management of commodity data. Thus, the problem of low efficiency of commodity data management due to complicated and cumbersome commodity data operations is avoided.
  • Commodity data It is data composed of attribute data of one or more commodities, for example, commodity data can be composed of commodity names, prices, and links.
  • the quantity of commodities specifically included in the commodity data may be determined by the CP according to the actual situation.
  • the technical personnel when setting the format of the commodity data, the technical personnel can also set the requirements for providing attribute data of the commodity.
  • a request can contain mandatory and optional attribute data.
  • CP provides attribute data according to actual needs. Therefore, the type and quantity of attribute data actually included in the commodity data need to be determined according to the attribute data provision requirements set by the technical personnel in the actual application and the data provided by the CP, and there are no excessive restrictions here.
  • the embodiments of the present application do not limit the content required for providing attribute data too much, which can be set by technical personnel according to actual needs.
  • the commodity data when the CP terminal uploads these attribute data in the form of files, the commodity data may also be called commodity files.
  • the commodity data may be structured data or unstructured data. Because in the embodiment of the present application, the commodity data will be divided into multiple sub-files for verification, and the attribute data that has passed the verification will be stored in the database. Therefore, regardless of whether the commodity data is structured or unstructured data, in the embodiments of the present application, efficient verification and storage of commodity data can be achieved, thereby improving commodity data management efficiency.
  • the process of storing the attribute data in the sub-file into the database is still the process of structured storage of the commodity data.
  • the attribute data provision request includes: product serial number (Identity document, ID), category, name, price, picture address, web page link, and application (Application, App) link.
  • the product code, category, name and price are all required attribute data, while picture address, web page link and App link are optional attribute data.
  • the image address refers to the download address of the product image.
  • the CP can prepare the attribute data of the commodity according to the above requirements, and organize the corresponding data table according to the actually obtained commodity attribute data.
  • the commodity data provided by CP can be as follows in Table 1 (the number of commodities is 4 at this time):
  • Commodity data is the original data provided by CP.
  • the embodiment of the present application will store the commodity data in a database (that is, put in a warehouse).
  • the embodiment of the present application refers to the commodity data stored in the database as commodity information.
  • the commodity data includes pictures and other data that cannot be stored in the database in a structured manner, this part of the data is regarded as data other than the commodity information and stored in other places than the database. For example, it can be stored in a network storage platform.
  • the commodity data to be managed at this time consists of commodity information and unstructured data.
  • the commodity information is all text-type information.
  • the database is a data warehouse that organizes, stores and manages data according to the data structure.
  • the database is used to store commodity information.
  • structured data also known as row data
  • structured data is data that is logically expressed and implemented by a two-dimensional table structure (that is, structured data is data stored in the form of a two-dimensional table), and strictly follows the data format and length.
  • Specifications are mainly stored and managed through relational databases.
  • the commodity information is structured commodity data, that is, it belongs to structured data. Since the database uses a certain data structure for data storage, the process of putting the commodity data into the warehouse is to store the commodity data in accordance with the database structure requirements. It can be seen from this that the warehousing process already includes the structuring of commodity data.
  • the specific terminal equipment where the database is located is not limited here, for example, it may be located on a single server or in a server cluster.
  • the type of the database is not limited too much here, and can be selected or set by the technical personnel according to actual needs.
  • it can be Mysql, Oracle or SqlServer.
  • the structural style of the two-dimensional table storing commodity information in the database can be determined according to the type of the specific database, which is not limited here.
  • Format (format of commodity data): In order to facilitate structured storage of commodity data (ie, storage in a database), the format of commodity data is set in advance by a technician in this embodiment of the present application. On this basis, the CP needs to organize the attribute data of the product according to this format, so as to obtain the product data that meets the requirements. For example, if the format is a data table, the CP needs to record the attribute data of the commodity into the data table, so as to obtain commodity data in the data table format.
  • the embodiments of the present application do not impose too many requirements on the specific format, which can be set by technical personnel.
  • technicians can set the format of commodity data as a data table, and set corresponding attribute data provision requirements (eg, which attribute data needs to be provided, and which can be provided or not provided).
  • attribute data provision requirements eg, which attribute data needs to be provided, and which can be provided or not provided.
  • the CP needs to sort out the attribute data of the commodity according to the requirements for providing attribute data, and record the sorted attribute data into the data table.
  • technicians can implement the settings for the format of commodity data and the provision of attribute data in the form of a data table template. That is, the technical personnel pre-set the required properties in the data table template.
  • the data table template you can refer to the following table 2:
  • the first row of the data sheet template is filled in by the technician in advance to fill in each attribute (ie, attribute 1 to attribute 7) that the CP needs to provide.
  • attribute ie, attribute 1 to attribute 7
  • the specific selected attributes are not limited here, and can be set by technical personnel. For example, it can be the ID, category, name, price, picture address, web page link, and App link in Table 1, or other attributes.
  • the CP needs to fill in or import various attribute data of the commodity in the data table template according to the preset attributes in the data table template to complete the input of the data table template.
  • the format of the commodity data is set by the technician first, and then the CP organizes the attribute data of the commodity according to the format, and obtains the commodity data that satisfies the format. On this basis, the CP uploads the sorted commodity data to the commodity management system, and the commodity management system stores the commodity data. It can be seen that the commodity data uploaded by the CP (also the original commodity data processed by the commodity management system) is the data that meets the format requirements.
  • CP terminal refers to the terminal device used by CP to upload product data.
  • This embodiment of the present application does not limit too much the device type of the CP terminal, which can be determined according to actual application scenarios. For example, it can be a desktop computer, a laptop computer, a tablet computer, or a mobile phone.
  • the terminal device that the CP performs commodity data preparation and commodity data uploading may not be the same device. For example, you can use a laptop to prepare product data, and then upload it using a mobile phone. Therefore, in theory, the CP terminal needs to have the ability to upload data, but it does not necessarily need to have the ability to add, delete, modify, verify, and adjust the format of commodity data.
  • User terminal refers to the terminal device used by the user to search for goods.
  • users can search for commodities by entering text or pictures in the user terminal and uploading them to the commodity management system.
  • the user may be a consumer or other personnel, such as a CP, which needs to be determined according to the application scenario.
  • This embodiment of the present application does not limit the device type of the user terminal too much, which can be determined according to the actual application scenario. For example, it can be a desktop computer, a laptop computer, a tablet computer, a mobile phone, or a wearable device.
  • NSP Network Storage Platform
  • devices with data storage capability and data transmission capability can be used as NSPs in the embodiments of the present application.
  • the specific device type and quantity of NSPs can be selected or set by technicians according to actual needs. For example, it can be a single server with document and picture storage function, or a server cluster with document and picture storage function.
  • Feature library In order to realize image search for commodities, the embodiment of the present application may perform image feature analysis on commodity pictures, and obtain image feature data. These image feature data are used for image matching during image search.
  • the feature library refers to a data warehouse for storing image feature data.
  • other data other than image feature data can also be stored in the feature library. The specific can be set by the technical personnel according to the needs.
  • the embodiment of the present application does not limit too much the situation of the terminal device where the feature library is located. It can be selected or set by technicians according to actual needs. For example, it can be in a single server, or in a server cluster.
  • the feature library may also be referred to as a commodity base library.
  • Cache component Provides distributed lock management services.
  • a locking mechanism may be set before the server needs to perform a task. Before the server needs to perform a task, it first applies to the cache component for a distributed lock for the task. When the distributed lock for the task is successfully obtained, the locking of the task is completed. The server can now perform the task. Correspondingly, at this time, other servers can no longer apply for the distributed lock of the task from the cache component, and cannot execute the task.
  • the implementation method of the distributed lock of the cache component is not limited too much. It can be set by technicians according to actual needs.
  • distributed locks can be implemented based on distributed cache (DCS Redis), or distributed locks can be implemented based on zookeeper.
  • DCS Redis distributed cache
  • the embodiment of the present application does not limit too much the situation of the terminal device where the feature library is located. It can be selected or set by technicians according to actual needs. For example, it can exist as a component in the server.
  • the cache component when a distributed lock is implemented based on a distributed cache, the cache component may also be referred to as a DCS distributed lock.
  • the format of the commodity data set by the technician is a data table.
  • the two parts of commodity data management and commodity search by users will be described through specific embodiments.
  • Part 1 The management operation of commodity data by commodity management system.
  • the commodity management system includes: at least one server, a database, and an NSP.
  • the commodity management system may further include a cache component and a feature library.
  • Fig. 2A shows the system interaction diagram of the commodity management system during data management of commodity data, which is described in detail as follows:
  • the CP terminal uploads commodity data to the NSP.
  • the set format is a data table template as an example for illustration. Assuming the data sheet template provided by the technician is Table 3 below:
  • ID, category, name, price and picture address are required attribute data
  • currency ID and picture ID are optional attribute data.
  • ID refers to the product serial number.
  • Category is the category of goods, which can be classified according to different needs. For example, it can be divided into: clothing, digital appliances, shoes, bags, home, toys, beauty, accessories, food and other categories.
  • the currency identification is the identification of the currency to which the price belongs. For example, RMB can be ⁇ , USD can be $, and GBP can be £.
  • the image address refers to the download address of the product image. Considering that in practical applications, CP may provide more product pictures, if uploading one by one at this time, the operation will be more complicated. Therefore, in this embodiment of the present application, the image address attribute is provided.
  • the CP can store the picture of the product in some servers, and fill in the corresponding picture address in the data table template to complete the provision of the picture of the product.
  • Image ID refers to the serial number of the product image.
  • a web page link (weburl) refers to a link of a product sales webpage, and by opening the link, a browser can be opened and a corresponding product sales webpage can be entered.
  • the web page link can be an ordinary web page link or an Html5 web page link.
  • App link refers to the link to the sales page of the product in the App. By opening this link, you can open the corresponding App and jump to the product sales page in the App.
  • the quick app link refers to the link to the sales page of the product in the quick app.
  • one or more links can be filled in the webpage link, App link and quick application link to realize the jump to different e-commerce platforms.
  • three different webpage links are provided, corresponding to the product sales webpages under three different e-commerce platforms. At this time, jumping to different e-commerce platforms can be realized.
  • CP can fill in or import commodity attribute data according to the actual situation (in practical applications, CP generally organizes commodity data when carrying out commodity inventory management. Therefore, the original commodity based on CP can be used here.
  • the data is imported into the data table template.
  • the workload of the CP is very small) to realize the preparation of commodity data.
  • the actual number of commodities may be a few, or thousands or tens of thousands, which needs to be determined by the CP according to the actual situation.
  • each product needs to fill in the above attribute data.
  • each row in the table represents the data of one product. Therefore, the number of rows in Table 3 also needs to be determined according to the actual quantity of commodities. For example, referring to Table 1, the number of commodities is 4 at this time.
  • a product description attribute may also be added.
  • the CP can fill in some descriptions of the product in Table 3, so as to facilitate the user to have an in-depth understanding of the product. For example, you can fill in "this product is green organic food”.
  • the CP can upload the commodity data to the NSP through the CP terminal.
  • the commodity data is a table file in which Table 3 (Table 3 after filling in or importing attribute data) is recorded.
  • the CP can upload the form files to the NSP through devices such as mobile phones or computers.
  • a portal (Portal) website for uploading commodity data may be preset.
  • the portal website can be accessed through the CP terminal, and commodity data can be uploaded from the portal website interface. Finish uploading product data.
  • the CP terminal may not be able to directly perform data transmission with the NSP in some cases.
  • an intermediate device such as a server
  • the server provides a callable application programming interface (Application Programming Interface, API) for the CP terminal.
  • API Application Programming Interface
  • the CP terminal sends the commodity data to the intermediate device by calling the API, and the intermediate device uploads the commodity data to the NSP to complete the uploading of the commodity data.
  • each sub-file contains attribute data of at least one commodity.
  • the commodity data will be split, and the commodity data will be divided into multiple sub-files.
  • the format of the sub-file obtained by splitting may be the same or different from that of the product data before the splitting.
  • the sub-file can be a data table or a file in other formats. Details are as follows:
  • the server After the commodity data is uploaded to the NSP, the server will download the commodity data from the NSP and split the commodity data into one or more sub-files.
  • the splitting rules of commodity data are not limited here, and can be determined by technical personnel according to actual needs. Considering that the smaller the amount of data contained in the sub-file, the higher the processing speed, reliability and timeliness of a single sub-file in theory, but at this time, the number of sub-files is large, which will cause the overall processing efficiency to decrease. decline. Therefore, technicians can set split rules according to the actual requirements for the efficiency and reliability of commodity data management. For example, it can be set as: the number of commodities contained in each sub-file is m, where m is a positive integer, such as 1000.
  • each subfile contains attribute data of m commodities (for the last subfile, the number of commodities may be less than m).
  • the number of commodities contained in each sub-file is any integer value in [1, n]. The value can be selected randomly or according to certain rules. where n is an integer greater than 1, such as 1000.
  • the sub-file contains less commodity attribute data. Compared with commodity data that contains more commodity attribute data, the server has a lower error probability in processing the sub-file and is more reliable.
  • the number of sub-files obtained in S102 is one or more.
  • the division rule is set as: the number of items contained in each subfile is 1000.
  • the commodity data contains less than 1000 commodities, such as 900.
  • the attribute data of all products will be divided into the same sub-file.
  • the number of commodities contained in a single commodity data may be extremely large in practical applications. For example, it may contain attribute data for thousands of products at the same time.
  • the unique determination of a single commodity is realized.
  • a unique identifier can be added to each item in the subfile.
  • the unique identifier can be added to the commodity data as a new attribute data of the commodity.
  • the CP may provide identifiers such as the ID of the commodity in the commodity data. But for the commodity management system, the behavior of CP is uncontrollable. Practice has proved that the logo provided by CP may also be duplicated, missing, irregular, etc.
  • the logo may not be unique, and the credibility is relatively low.
  • the server adds a unique identifier to each commodity by itself, which can ensure the reliability of the unique identifier, thereby ensuring accurate distinction between various commodities.
  • the CP is required to provide commodity data in the form of a data table template.
  • the attribute data of a single product are in the same row, that is, each row is all the attribute data of a product. Therefore, the generated unique identifier can be added to the data table template as the row number attribute data of the product.
  • the line number of the product in the product data is the unique identifier of the product.
  • the embodiments of the present application do not limit the type and generation method of the unique identifier too much, which can be set by technical personnel.
  • the unique identifier can be formed by the upload time of the product data and the serial number of the product.
  • the unique identifier can also be a randomly generated non-repeating string, which is used as the unique identifier of a single product.
  • the length of the unique identifier can be set, for example, it can be set to a fixed length of 16 bits. When generating the unique identifier, if it is less than this length, it will be filled with spaces, or filled with 0s.
  • the server stores all the sub-files in the NSP, and obtains the download address of each sub-file in the NSP. Based on the download address, subtasks corresponding to subfiles are created in the database, and a parent task containing these subtasks is created at the same time.
  • S103 can be subdivided into S1031 and S1032:
  • the server stores all the sub-files in the NSP, and obtains the download address of each sub-file in the NSP.
  • S1032 Based on the download address, the server creates sub-tasks corresponding to the sub-files one-to-one in the database, and creates a parent task including these sub-tasks at the same time.
  • the server that performs the split operation will upload all the obtained sub-files to the NSP uniformly. At the same time of storage, the download address of each sub-file in NSP will be obtained.
  • the NSP and the server that executes S103 are mutually independent devices.
  • the server After obtaining the download address of each subfile, the server will create a subtask corresponding to each subfile one-to-one in the database, and store the download address of each subfile in the corresponding subtask.
  • the subtask can be executed by the server.
  • the essence of executing the subtask is: the server downloads the subfile corresponding to the subtask through the download address in the subtask, and checks and stores the attribute data in the downloaded subfile.
  • the server can effectively process each sub-file, and finally realize the storage of commodity data.
  • each subtask needs to be executed to realize the complete storage of commodity data. Therefore, in the process of executing the subtasks, the server needs to confirm whether the subtasks under the single commodity data are all executed and completed (the confirmation method can be set by the technicians, which is not limited here).
  • the commodity management system may need to process multiple commodity data at the same time. Therefore, for the database, it may store subtasks corresponding to multiple commodity data at the same time. At this time, the number of subtasks is large, and the management is more difficult.
  • the server also creates a parent task containing these subtasks when creating subtasks.
  • each product data corresponds to a parent task, and a single parent task contains all subtasks under the corresponding product data.
  • query the execution status of each subtask in the parent task corresponding to the commodity data Therefore, the efficiency of subtask management can be improved.
  • the embodiment of the present application further records the execution status of each subtask.
  • the execution status of the subtask includes three types: not executed, executing, and executing completed.
  • not executed means that the subtask is not currently executed by any server.
  • Executing means that the subtask is currently being executed by at least one server, and no server has completed the subtask.
  • Execution complete means that the subtask has been executed by at least one server. Because in the embodiment of the present application, the essence of executing the subtask is to check and store the attribute data in the subfile corresponding to the subtask. Therefore, the fact that it is not executed means that the attribute data in the subfile corresponding to the subtask has not been checked and stored. Executing means that the attribute data in the corresponding subfile in the subtask is being checked and stored. The execution completion means that the attribute data in the sub-file corresponding to the sub-task has been checked and stored. For the subtasks just created, the execution status will be marked as not executed in the database.
  • sub-file a and sub-file b are obtained after the product data A is split.
  • the server will store the two sub-files in the NSP, and obtain the corresponding download addresses of the two sub-files.
  • the download address of sub-file a in NSP is: https://xxxhuawei.com/filea/huawei.html
  • the download address of sub-file b in NSP is: https://xxxhuawei.com/fileb/huawei.html .
  • the server will create subtask a and subtask b in the database, as well as a parent task A that contains both subtasks (the parent task may have no substantive task content, only the included subtasks are recorded).
  • the download address: https://xxxhuawei.com/filea/huawei.html is stored in subtask a
  • the download address: https://xxxhuawei.com/fileb/huawei.html is stored in subtask b.
  • an identifier or ID may be added to each subtask.
  • the database and the server interact, they can uniquely determine the subtask by informing each other of the subtask identifier or ID.
  • the server may delete the commodity data uploaded by the CP terminal in the NSP to save NSP storage space.
  • S102 the server firstly performs a distributed lock for a single commodity data. And only after the distributed lock is obtained (that is, the lock is completed), operations such as downloading and splitting the product data will be performed.
  • S102 can be replaced with: S1021, the server obtains the distributed lock for the commodity data from the cache component. If the distributed lock is obtained, the commodity data is downloaded from the NSP and split to obtain one or more sub-files. Wherein, each sub-file contains attribute data of at least one commodity.
  • multiple servers are used to be responsible for splitting commodity data, so as to realize task decomposition.
  • distributed locks are also introduced. Details are as follows:
  • Each server queries a task list that records commodity data to be processed.
  • each server applies for a distributed lock for the commodity data respectively. That is, to grab the lock.
  • the server that successfully grabs the lock will act as the execution body of S102-S103 to download commodity data from NSP.
  • the sub-files are split to obtain multiple sub-files.
  • All the obtained sub-files are stored in the NSP, and at the same time, based on the download address of the sub-files, sub-tasks corresponding to each sub-file are created in the database.
  • a line number can also be added to each commodity in the sub-file as a unique identifier.
  • the server sends a task query request to the database.
  • the embodiment of the present application starts to process the subtasks to verify each subfile corresponding to the subtasks.
  • the server first sends a task query request to the database, and the task query request is used to request the database to inform the server of the subtasks to be executed under the current parent task.
  • the data content and format of the task query request, etc. are not limited here, and can be set by technical personnel according to requirements.
  • the database filters out subtasks to be executed from all subtasks included in the parent task, and returns the screened subtasks to the server in the form of a subtask list.
  • whether the subtask is a subtask to be executed is determined based on the execution state of the subtask. Specifically, the execution state corresponding to the subtask to be executed is preset by the technician. On this basis, after receiving the task query request, the database will identify the execution status of each subtask under the parent task, and screen out the subtasks to be executed whose execution status meets the requirements. For example, all subtasks whose execution status is not executed may be regarded as subtasks to be executed. At this time, the database will regard all subtasks whose execution status is not executed under the parent task as subtasks to be executed.
  • technicians can also add some other screening conditions on the basis of the execution status, so as to achieve accurate distinction and screening of subtasks to be executed.
  • a limit on the execution time of the task can also be increased.
  • the database will obtain the execution status and execution duration of the subtasks at the same time, and screen out subtasks whose execution status and execution duration both meet the preset requirements, as subtasks to be executed.
  • the subtasks to be executed have the following optional ranges:
  • unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed.
  • the filtering operation of the subtasks to be executed in S105 can be replaced with:
  • the database After receiving the task query request, the database filters out unexecuted subtasks from all subtasks included in the parent task, and subtasks that are being executed and whose execution duration exceeds the duration threshold.
  • a duration threshold is preset in this embodiment of the present application. Subtasks whose execution duration exceeds the duration threshold will be determined as execution duration timeout.
  • the embodiment of the present application will put these subtasks in the same list (ie, the subtask list), and feed back the subtask list to the server that sends the task query request.
  • each subtask is recorded with a corresponding subfile download address, so that the server can download the subfile for processing.
  • the subtask list may not be sorted out at this time, but the screened subtask may be directly returned to the server. It is also possible to return a subtask list containing only one subtask. The details can be set by the technicians themselves.
  • the server determines the subtask to be executed according to the subtask list.
  • the database records the creation time of each subtask. After the subtasks are filtered out, the subtasks will be prioritized according to their creation time and execution status, and a subtask list will be generated according to the sorting results. Then, the sorted subtask list is fed back to the server.
  • the specific subtask priority sorting rules are not limited here, and can be set by technical personnel according to actual needs. For example, it can be set that the priority of subtasks that are not executed is higher than that of subtasks that are being executed. Among them, the priority of the subtasks that are not executed decreases from first to last according to the creation time, and the priority of the executing subtasks also decreases from the first to the last according to the creation time.
  • the logic of each operation to be performed by the database in S105 can be built into the database, or can be built into the terminal device where the database is located in the form of a program. Specific can be set by technical personnel according to actual needs.
  • the operation of S105 can be completed by the database itself.
  • the terminal device completes the operation of S105.
  • the server After receiving the subtask list, the server determines a subtask from the subtask list, and downloads a subfile corresponding to the subtask from the NSP according to the download address in the determined subtask.
  • the server After receiving the subtask list, the server will select a subtask as the subtask to be executed this time. At the same time, after the subtask selection is completed, the server will also download the corresponding subfile from the NSP according to the download address in the subtask to perform subsequent data verification.
  • the embodiment of the present application does not limit the selection method of subtasks too much, which can be set by technical personnel according to the actual situation.
  • the first subtask in the subtask list may be selected, or a subtask may be randomly selected.
  • the database prior to sending the subtask list, the database has prioritized each subtask in the subtask list. At this time, the server can select the subtask with the highest priority for processing. For example, when the priority is sorted in descending order, the first subtask can be selected for processing.
  • the server after determining the subtask to be executed this time, the server will also inform the database that the subtask is currently being executed (this can be achieved by sending an instruction carrying the subtask ID and execution status to the database) ).
  • the database obtains the message that the subtask is selected to be executed by the server.
  • the execution status of the subtask will be set to: executing, and the time when this message is learned is set to the last update time (last_update_time) of the subtask.
  • the execution time of the subtask is equal to the difference between the current time and the last update time, that is, now()-last_update_time.
  • multiple servers may be used to process each subtask in the parent task at the same time.
  • multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced.
  • distributed locks are introduced to perform operations. Specifically, referring to FIG. 2C, at this time S106 can be replaced with:
  • the server after receiving the subtask list, the server selects a subtask from the subtask list without repetition, and after selecting the subtask, applies to the cache component for a distributed lock on the subtask.
  • the server stops the selection operation for the subtask, and downloads the subfile of the subtask from the NSP according to the download address in the subtask.
  • the server if the distributed lock of the subtask is not successfully acquired, the server returns to select a subtask that is not repeated from the subtask list, and after the subtask is selected, applies to the cache component for a lock on the subtask Operation of distributed locks.
  • the server after receiving the subtask list, the server first selects a subtask from the subtask, and tries to apply for a distributed lock to the subtask from the cache component.
  • the server may inform the cache component which subtask distributed lock is applying for this time by sending the identifier or ID of the subtask to the cache component.
  • the distributed lock of a single subtask can only be assigned to a single server. Therefore, if the subtask is not processed by other servers, in theory, the distributed lock for the subtask can be obtained at this time.
  • the subtask has been processed by other servers, it is based on the principle of applying for a distributed lock before execution. At this time, the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time. Based on this principle, after obtaining the distributed lock and completing the locking operation, the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. On the contrary, if acquiring the distributed lock fails, the operation of subtask selection in S1061 will be re-executed to reselect an appropriate subtask.
  • the embodiment of the present application does not limit the selection method of subtasks too much, and theoretically, it is only necessary to not repeat the selection. For example, in some optional embodiments, it may be selected randomly or sequentially. as an optional embodiment of the present application. If the database has prioritized each subtask in the subtask list before sending the subtask list. At this time, the server can select subtasks in sequence according to the order of priority from high to low. For example, when sorting is based on the order of priority from high to bottom, the selection method may be set as: from the sub-tasks that have not been selected in the task list, select a sub-task that is ranked first. The embodiment of the present application can prevent the situation that a single subtask is not processed for a long time, and can improve the efficiency of processing the subtask.
  • the server S1062 further informs the database that the current subtask is being executed. To help the database update the execution status and execution time of the subtask.
  • the server performs attribute data verification on each commodity in the sub-file. And store the attribute data of the products that have passed the verification to the database.
  • the abnormal information of the attribute data of these commodities is recorded, and the abnormal information is stored in the database.
  • the server After acquiring the sub-file, the server starts to verify the attribute data of each commodity in the sub-file. That is, check whether the attribute data of the product meets the preset requirements.
  • the requirements for commodity attribute data there may be some differences in the requirements for commodity attribute data in different practical application scenarios. For example, in some possible scenarios, in order to adapt to the display effect of mainstream terminal devices, the requirements for pictures of commodities are relatively strict. At this time, the format and size of the image may be more strictly required. In other possible scenarios, in order to provide users with more comprehensive product data, there may be higher requirements on the types and quantity of product attribute data. Therefore, the embodiments of the present application do not limit the specific data verification requirements too much. In practical applications, the requirements for commodity attribute data can be preset by technical personnel according to the requirements of actual application scenarios. The server then verifies the attribute data of each commodity in the sub-file according to the preset requirement.
  • a method of verifying the attribute data of only one commodity at a time can be selected to realize the verification of the attribute data of each commodity.
  • You can also choose to perform concurrent processing on multiple commodities, that is, perform attribute data verification on multiple commodities at the same time each time, so as to improve the verification efficiency.
  • the specific can be set by the technical personnel according to their own, and there is no excessive limitation here.
  • the verification rules for each attribute data of a single product are not limited here, and can be set by technicians. For example, in some optional embodiments, it may be set to sequentially determine whether each attribute data of a commodity meets the requirements. In other optional embodiments, it can also be set to determine whether multiple attribute data meet the requirements at the same time. In this case, the verification efficiency can be improved.
  • the embodiment of the present application will take a single commodity as an operation object to verify attribute data. Therefore, in theory, each product in the sub-file will have a corresponding verification result.
  • the verification results of a single commodity can be divided into two categories:
  • verification failure There is an abnormality in the attribute data of the product, so that the verification fails (referred to as verification failure).
  • the server will store the attribute data of the commodities in the database. Since the database stores data in a structured manner, the process of storing attribute data to the database is a structured storage process for the attribute data. Among them, if the attribute data includes a product image, or includes the download address of the product image. Then, in this embodiment of the present application, the corresponding commodity picture will be stored in the NSP.
  • the server will record the abnormal information corresponding to the product, such as what attribute data is abnormal for the product. And feedback the exception information to the database.
  • the server In order to realize the record of abnormal commodity attribute data, it is convenient for CP to re-upload commodity attribute data according to the abnormal information, and improve the efficiency of commodity data management.
  • each row in the data table is used to record all attribute data of a commodity.
  • attribute data that must be provided include: product ID, name, category, picture address and web page link.
  • the server splits the commodity data in S102 the split sub-file is also in a data table format.
  • a unique ID will be generated for each item in the subfile, and the ID will be added to the item's row as the item's row number.
  • the data verification operation on the sub-file may include: S1071-S10710.
  • the server reads one line of data in the sub-file, and splits the read one line of data into a first line number and first data.
  • the server will read a single row of data at a time.
  • it may be set as a non-repetitive reading operation each time. At this point, the operation of the server to read a line of data in the subfile can be replaced by:
  • the server reads a line of data within the subfile without repetition.
  • the embodiment of the present application After reading the attribute data of a single commodity, the embodiment of the present application will first extract the row number (ie, the first row number) therein, to obtain the row number of the commodity and the remaining attribute data.
  • the attribute data required to be provided in the embodiment of this application includes: the ID, name, category, picture address and web page link of the product. Therefore, if the CP provides product data according to this requirement, theoretically, the remaining attribute data at this time is the ID, name, category, picture address and web page link of the product.
  • the server determines whether the first line number has been processed. If processed, return to S1071 to continue processing the next line of data. If it has not been processed, S1073 is executed.
  • a single row of data may be processed multiple times by a single server.
  • a single line of data is repeatedly divided into multiple subfiles, and these subfiles containing the same line of data are processed by the same server.
  • Another example is the scenario in S1071 that the server repeatedly reads the same row of data (in this case, no non-repeat reading is set).
  • the attribute data of a single product will be repeatedly verified, which reduces the efficiency of product data verification.
  • the line number is the unique identifier of the item. Therefore, after parsing the line number of the product, the server will first determine whether it has processed the line number.
  • the server performs attribute data analysis on the first data. If the parsing fails, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the execution returns to S1071. If the parsing is successful, all attribute data contained in the first data are obtained, and S1074 is executed.
  • the embodiment of the present application will start to perform attribute analysis on the first data to determine whether the text format of the first data is legal. If it can be parsed normally and get the ID, name, category, picture address and web page link of the product. It means that the text format of the first data is legal. On the other hand, if the parsing fails, it means that the text format of the first data is illegal and cannot be parsed and restored normally.
  • the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data to feed back the abnormal situation to the CP, help the CP to quickly locate the abnormal product, and re-provide the attribute data of the corresponding product.
  • the embodiment of the present application does not limit too much the data type of the abnormal information, which can be set by a technician according to actual needs.
  • specific abnormal conditions such as "data parsing abnormality”
  • the text can be used as abnormal information.
  • corresponding exception codes may be set in advance for various possible abnormal situations, and the corresponding exception codes may be used as exception information.
  • the abnormal code corresponding to the parsing failure can be set to 2203, and the abnormal information can be 2203 in this case. It can also be in the form of exception code and text as exception information. The same is true for the abnormal information data types in the following steps, which will not be repeated in this embodiment of the present application.
  • the embodiments of the present application respectively perform legality verification on the parsed ID, name, category, picture address, and web page link. That is, it is determined whether each attribute data has problems such as missing data or data errors. For example, for the category, it is assumed that it is pre-divided into: clothing, digital appliances, shoes, bags, home, toys, beauty, accessories, food and other categories. At this time, the embodiment of the present application will verify whether the filled-in category data belongs to these categories. If it belongs, it can be judged that the validity check of the category is passed. If not, it is determined that the verification fails. For example, assuming that "pants" is filled in, and it does not belong to the above classification, it is determined that the verification fails.
  • the embodiments of the present application do not limit too many rules for the validity verification of these attribute data. It can be set by technicians. For example, it can be set to verify each attribute data in sequence, or to verify multiple attribute data at the same time. And it is set to stop the verification and determine that the current row data is abnormal when the validity verification of attribute data fails.
  • the abnormal information corresponding to the failure of the validity verification may be the text "commodity parameter verification failed”. Or exception code 2204. It is also possible to include both.
  • S1075 determine whether the commodity corresponding to the row data already exists. If the commodity already exists, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the execution returns to S1071. If the commodity does not exist, execute S1076.
  • the server may repeatedly verify the same product, which reduces the efficiency of verification. Therefore, after the verification of the attribute data is completed in the embodiment of the present application, the server will, according to the ID of the commodity, determine whether to determine whether it has processed the commodity of the ID.
  • the abnormal information corresponding to the commodity already exists which may be the text "the commodity already exists". Or exception code 2202. It is also possible to include both.
  • the embodiment of the present application will try to download the product image according to the image address.
  • the embodiment of the present application uploads the abnormal information corresponding to the picture address to the database.
  • the abnormal information is recorded by the data.
  • the abnormal information corresponding to the download failure may be the text "image download failed”. Or exception code 2303. It is also possible to include both.
  • S1077 Determine whether the volume of the downloaded product image exceeds a preset volume threshold. If the volume threshold is exceeded, the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the process returns to S1071. If the volume threshold is not exceeded, execute S1078.
  • a volume threshold is preset, and the volume of the commodity image provided by the CP is required not to exceed the volume threshold.
  • the specific value of the volume threshold can be set by technical personnel according to actual needs. For example, it can be set to 2 MB. Therefore, after downloading the product image, this embodiment of the present application will determine whether the volume of the product image exceeds the volume threshold.
  • the server will determine that the image volume is abnormal, and upload the abnormal information corresponding to the image volume to the database.
  • the abnormal information is recorded by the data.
  • the abnormal information corresponding to the image volume exceeding the volume threshold may be the text "image is too large”. Or exception code 2305. It is also possible to include both.
  • S1078 Identify whether the format of the commodity picture belongs to the first format. If it does not belong to the first format, it is determined that the current row data is abnormal, the abnormal information is recorded, and the execution returns to S1071. If it belongs to the first format, execute S1079.
  • the image formats supported by a single server and terminal device are often limited. Therefore, in order to prevent the situation that the image format is not supported, the server cannot process the image of the product, or the user cannot view the image of the product normally.
  • This embodiment of the present application will continue to verify whether the format of the product image is legal.
  • one or more formats are preset as legal formats (ie, the first format). At this time, it is to identify whether the format of the product image is a legal format. If it belongs, it is judged that the verification is passed. If it does not belong, it is determined that the verification fails, and the image format of the image product is abnormal. And the abnormal information corresponding to the image format will be uploaded to the database. The abnormal information is recorded by the data.
  • the picture format does not belong to the abnormal information corresponding to the legal format, and may be the text "unsupported picture format”. Or exception code 2304. It is also possible to include both.
  • the embodiment of the present application uploads the commodity picture to the NSP for storage.
  • the embodiment of the present application will upload the abnormal information corresponding to the failure to upload the picture to the database.
  • the abnormal information is recorded by the data.
  • the abnormal information corresponding to the image upload failure may be the text "system abnormality”. Or exception code 1001. It is also possible to include both.
  • the embodiment of the present application After uploading the image of the product, the embodiment of the present application uploads the attribute data of the product to the database.
  • the embodiment of the present application uploads the abnormal information corresponding to the failure to upload the attribute data to the database.
  • the abnormal information is recorded by the data.
  • a commodity will be re-selected from the sub-file as the object to verify the attribute data.
  • the probability of uploading to the database is relatively high.
  • the storage operation of the attribute data of the currently verified product is completed.
  • a commodity will be re-selected from the sub-file as the object to verify the attribute data.
  • the abnormality information corresponding to the failure to upload the attribute data may be the text "system abnormality”. Or exception code 1001. It is also possible to include both.
  • the data of each line in the sub-file can be processed one by one. Further, the verification of the attribute data of each commodity in the sub-file is realized.
  • the embodiment of the present application will determine that there is abnormality in the attribute data of the currently verified product. . That is, the current verification product verification fails. If the verification of S1073-S10710 is successful, it will be determined that the current verification of the commodity has passed the verification.
  • the verification of the current sub-file can be stopped in time. Allows the server to continue performing other tasks.
  • S1071 also include:
  • the verification is performed sequentially by checking the line number, attribute data format, attribute data validity, ID, product image download, product image volume, product image upload, and attribute data storage. In this way, the complete and reliable verification of commodity attribute data is realized. At the same time, it also realizes the storage of attribute data and the accurate recording of abnormal information.
  • a unique identifier of the processed commodity such as a line number
  • the execution of the sub-task is timed out.
  • other servers execute this subtask, they can continue to verify from the last verified commodity according to the line number. Further, the efficiency of verification is improved, and the work of repeated verification of commodity attribute data is reduced.
  • a method of applying for a distributed lock to a subtask is used to prevent a single subtask from being executed multiple times. At this time, in order to prevent the server from malfunctioning, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask.
  • two optional coping methods are provided:
  • the server After the server acquires the distributed lock for the subtask, it will start timing. When the timing duration reaches the duration threshold and the subtask is still not completed. The server will actively inform the cache component to release the distributed lock on the subtask. At this time, other servers can apply for the distributed lock of the subtask again and process the subtask.
  • the cache component will start timing after allocating the distributed lock corresponding to the subtask to the server.
  • the timing duration reaches the duration threshold, the distributed lock on the subtask will be released actively. That is, the forced unlocking of the distributed lock of the subtask.
  • any server can apply for the distributed lock of the subtask again and process the subtask.
  • technicians can choose any one or both of the above-mentioned coping methods to apply, to realize automatic unlocking of distributed locks whose subtasks have timed out. So that a single subtask can be automatically released in time when the execution is abnormal, and the result can be obtained by other servers. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
  • the database knows that subtask A starts to be executed by the server at 12:00. At this time, 12:00 is still the last update time of subtask A. As a result, after 12:5, although the subtask A is being executed normally by the server, the database considers that the execution of subtask A has timed out. The timeout will cause the subtask to be executed repeatedly by multiple servers, which will reduce the processing efficiency.
  • the database After receiving the attribute data or abnormal information of the commodity, the database stores the received attribute data or abnormal information on the one hand. In order to realize the storage of attribute data and the recording of abnormal information. On the other hand, the time when attribute data or exception information is received will be updated to the last update time of the subtask. In this way, the execution time of the subtask can be updated.
  • the database may create a task detail (TaskDetail) table for each parent task. And when the abnormal information is received, the abnormal information will be recorded in the task detail table. After the verification of all sub-files of the commodity data is completed, the record of all abnormal information corresponding to the commodity data is completed in the task detail table.
  • the CP can clearly know which commodity in the commodity data has an abnormality in the familiar data. And accordingly provide the corresponding attribute data again. To improve the efficiency of commodity management.
  • FIG. 2E As an optional embodiment of processing subtasks in the present application, reference may be made to FIG. 2E for the overall process of processing subtasks.
  • multiple servers are used to perform concurrent processing on subtasks.
  • distributed locks are also introduced. Details are as follows:
  • Each server obtains subtasks from the database and simultaneously applies for distributed locks for the subtasks. That is, to grab the lock.
  • the server that successfully grabs the lock will be the execution body of S104-S107 (because it is a multi-server concurrent processing subtask, so for each subtask, the server that is the execution body of S104-S107 can be the same or different), download it from NSP The subfile corresponding to the subtask.
  • the product image is downloaded and stored in the NSP.
  • the attribute data of the commodities in the sub-file will be stored in the database during the verification process. In this way, the concurrent storage and processing of commodity pictures and attribute data is realized.
  • the server After storing the attribute data of all the commodities in the sub-file that have passed the verification in the database, the server determines that the execution of the sub-task is completed. And send a state update instruction to the subtask to the database, so as to update the execution state of the subtask in the database to be executed.
  • the embodiment of the present application will determine that the verification of the commodity is completed. After all commodities in the subtask are verified, the server will send a status update instruction to the database to inform the database that the subtask execution is completed. After receiving the status update instruction, the database will update the execution status of the subtask to execution completed.
  • the content specifically included in the state update instruction is not limited here.
  • the completion of the execution of the subtask includes at least two cases:
  • a method of applying for a distributed lock to a subtask is used to prevent a single subtask from being executed multiple times.
  • the server releases the distributed lock on the subtask.
  • the server continues to send a task query request to the database after completing the subtask.
  • the server After the server completes the current subtask, it will continue to process the next subtask. Therefore, it will return to execute S104 at this time, and send the task query request to the database again.
  • the database After receiving the task query request, the database identifies the execution status of each subtask in the parent task. If all subtasks in the parent task are completed, it is determined that the storage of commodity data is completed, the storage result is generated, and the storage result is sent to the server.
  • the server feeds back the storage result to the CP terminal.
  • step of S105 is executed at this time.
  • the database after receiving the task query request, the database identifies the execution status of each subtask under the parent task. Unlike the parent task and the child task when the database is created, at least one server has already executed the child task under the parent task. So for the parent task, there are two possible cases:
  • the completion of the processing refers to the completion of the storage of the attribute data.
  • the completion of the processing means that the corresponding abnormal information is recorded in the database. Therefore, at this time, the embodiment of the present application will determine that the current storage of commodity data is completed.
  • the storage situation may include the following:
  • the database records the corresponding exception information.
  • the corresponding warehousing result can be set as the successful warehousing of some commodity data, and the recorded abnormal information can be regarded as part of the warehousing result.
  • the task detail table is used to record abnormal information, the task detail table will be used as part of the storage result.
  • the database After getting the warehousing result, the database sends the warehousing result to the server. The server will feed back the received storage result to the CP terminal. Finally, the CP terminal will display the storage results to the CP for viewing.
  • S110 may be completed by the database itself, or may be completed by the terminal device where the database is located. For details, refer to the relevant description in S105.
  • the CP realizes the effective upload of the commodity data.
  • the commodity data there is an abnormality in the attribute data of the commodity.
  • the CP can view the abnormal information in the storage result (if there is a task details table, you can directly view the task details table).
  • the abnormal products can be determined according to the abnormal information, and the attribute data of these abnormal products can be rearranged or checked. These attribute data are then re-uploaded to the NSP as new product data, so as to retry the storage of the product data of the abnormal product.
  • the CP only needs to provide commodity data according to certain format requirements, and the commodity data may be structured or unstructured data.
  • the commodity management system splits the commodity data to obtain multiple sub-files, and creates corresponding sub-tasks for each sub-file.
  • one or more servers are used to perform data verification on each sub-task, and synchronously store the commodity data in the sub-files to the database, and store the corresponding commodity pictures to the network storage platform. This makes warehousing more efficient and realizes efficient management of commodity data.
  • the step-by-step warehousing operation of attribute data is the operation of structured warehousing of commodity data. Therefore, regardless of whether the commodity data is structured or unstructured data, the embodiments of the present application can implement the structured storage of commodity data.
  • the CP only needs to provide the commodity data according to the format requirements, and can realize the offline import of the commodity data into the database (abbreviated as offline import, offline means that the user does not need to operate online after uploading).
  • the CP may not perform data structuring operations. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work.
  • the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability.
  • the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data.
  • the embodiments of the present application can implement highly concurrent and efficient processing of subtasks.
  • the distributed lock is locked for too long, both the server and the cache component will automatically unlock the subtasks.
  • This enables subtasks to be re-applied for locking and processing by other servers, and realizes node management and automatic hosting of subtasks.
  • it can prevent the server from being unable to process the subtasks normally due to reasons such as failure of the server, so that the subtasks cannot be executed normally for a long time. This makes the processing of subtasks more reliable.
  • the embodiments of the present application can effectively process large quantities of commodity data. Therefore, the embodiments of the present application can support effective processing of large task scenarios.
  • the analysis and feedback of the abnormal information of commodity attribute data the detailed display of task failure can be realized, which is beneficial to CP to supplement abnormal attribute data in a targeted manner. The operating efficiency of the CP is improved.
  • the operation of the server performing attribute data verification on the sub-file is to synchronously implement the storage of attribute data and the recording of abnormal information during the verification process.
  • attribute data verification may also be performed on the sub-files first.
  • the attribute data and abnormal information in the sub-file are stored in the database. Referring to FIG. 3A, at this time S107 can be replaced with:
  • the server performs attribute data verification on each commodity in the sub-file.
  • the server completes the verification of the attribute data of each commodity in the sub-file.
  • all the attribute data are put into the warehouse.
  • the exception information of the attribute data will be recorded.
  • all abnormal information will be sent to the database together.
  • the embodiment of the present application may be applied in combination with the embodiment shown in FIG. 2D .
  • the embodiment of the present application will first perform the operations of S1071-S10711. And after the sub-file verification is completed, the attribute data and exception information will be stored in the database again.
  • Embodiment a commodity attribute data is stored in the warehouse while verifying a single sub-file, and each time the attribute data of a single commodity is used as the object for verification and storage.
  • the commodity attribute data is stored in the warehouse only after all the verification of the single sub-file is completed.
  • Embodiment a the server checks and stores the attribute data of a single commodity every time. Therefore, the granularity of each operation is at the level of a single item.
  • the server performs the storage of commodity attribute data only after the verification of a single sub-file is completed. So the granularity is at the individual subfile level.
  • the verification precision of the embodiment a is higher than that of the embodiment b.
  • the server can theoretically synchronize the attribute data verification operation and the warehousing operation at the level of a single commodity. Therefore, in the process of verifying the sub-file, the attribute data or abnormal information of each commodity in the sub-file will also be stored in the database synchronously. On this basis, if the server is abnormal, the database can also record all the verified commodity attribute data in the current sub-file before the server is abnormal. on the basis of.
  • the other servers re-verify the sub-file, they can choose to start the verification from the beginning, or they can choose to continue to verify the commodity attribute data in the sub-file that has not yet been put into storage.
  • the server will verify the attribute data of each commodity in turn, and an exception occurs when the 500th commodity is verified (at this time, the attribute data verification of the 500th commodity has not been completed), and the verification cannot be continued.
  • the attribute data of the first 499 products are all stored in the warehouse.
  • the other servers verify the sub-file a, they can choose to re-check the attribute data of 1000 commodities, or they can choose to re-check the attribute data from the 500th commodity.
  • Embodiment b if the server cannot continue to verify the current sub-file due to an abnormal situation, the database will be unable to obtain the attribute data in the current sub-file. Therefore, other servers need to re-check the subfile completely.
  • the embodiment a can theoretically reduce the probability of repeated verification of the sub-files, thereby reducing the workload of verifying the sub-files, and realizing the effective response to the abnormal situation of the server.
  • Embodiment a or Embodiment B can select Embodiment a or Embodiment B to perform verification and storage of commodity data in combination with actual application requirements. Not too limited here.
  • Part 1 Some supplementary explanations on the management operation of commodity data by commodity management system.
  • the offline import progress of product data can be displayed on the CP.
  • the server will count the progress of offline import of commodity data, and feed it back to the CP terminal. Details are as follows:
  • the server sends a progress query request to the database.
  • the database After receiving the progress query request, the database obtains the first number of subtasks executed and completed under the parent task, and the second number of all subtasks included under the parent task.
  • the database generates progress data according to the first quantity and the second quantity, and sends it to the server.
  • the server sends progress data to the CP terminal.
  • the CP terminal displays the progress data.
  • the server may send a progress query request to the database. After the database receives the request, it will respond to the request. That is, the number of subtasks executed and completed under the parent task (that is, the first number) and the total number of subtasks (that is, the second number) are obtained. And will generate progress data based on these two quantities. Then send it to the server, and the server sends it to the CP terminal for display.
  • the server may actively query according to certain rules, such as regular query, or periodic query. It can also respond to a query initiated by the CP.
  • the CP needs to operate in the CP terminal.
  • a query request is sent by the CP terminal to the server.
  • the server then queries the database.
  • the embodiments of the present application do not limit too many ways of representing offline import progress. Therefore, the format and analysis method of the corresponding progress data are not limited here. It can be set by technicians according to actual needs.
  • offline import progress can be characterized as a percentage.
  • the offline import progress may also be represented by the method of "the number of subtasks executed/total number of subtasks".
  • the database does not need to process the number of completed subtasks and the total number of subtasks. And the two quantity values can be fed back to the server as progress data.
  • the CP terminal can display the progress in the form of "number of subtasks executed/total number of subtasks". For example, suppose that there are 20 subtasks under the parent task, of which 10 subtasks are executed. At this time, the CP can display the offline import progress in a "10/20" manner.
  • the embodiment of the present application realizes the progress feedback of offline import of commodity data, so that the CP can know the progress in time.
  • FIG. 2A to FIG. 3A can also be used for online management of commodity data.
  • the online management of commodity data includes addition, deletion, modification and query of commodity data.
  • adding, deleting, and modifying refers to adding attribute data of new products, deleting attribute data of existing products, and modifying attribute data of existing products on the basis of commodity data already in storage.
  • Query refers to querying the attribute data of existing products on the basis of commodity data already in storage.
  • the embodiments shown in FIG. 2A to FIG. 3A may be preferentially used to implement offline import and management of commodity data.
  • the CP can wait for a period of time after uploading the commodity data to the commodity management system, and then the storage and management of the commodity data can be realized.
  • the embodiments shown in FIG. 2A to FIG. 3A can be used to implement offline import of commodity data.
  • the embodiments shown in FIG. 2A to FIG. 3A can be used to implement offline import of commodity data.
  • the commodity data storage management method actually adopted by the CP can be selected by the CP according to the needs, and there is no excessive restriction here.
  • the embodiments shown in FIG. 2A to FIG. 3A can be applied to full-scenario requirements with a large or small number of commodities at the same time.
  • the server provides a callable API to the CP terminal. And supports the software development kit (Software Development Kit, SDK) of Java, PHP, C++, Python and other languages.
  • Software Development Kit, SDK Software Development Kit
  • the commodity management system provides the CP with commodity management services (ie commodity management implemented by the embodiments shown in FIGS. 2A to 3A ), and provides users with online commodity search services.
  • the CP invokes the server API through the CP terminal to trigger the commodity management service of the commodity management system to complete the online management of commodity data.
  • any one of the embodiments shown in FIG. 2A to FIG. 3A can be used to implement.
  • sub-file splitting may not be performed.
  • the verification of each attribute data in the commodity data is directly completed by the server API. And realize operations such as attribute data storage and exception information recording.
  • the CP can call the server API through the CP terminal to inform the server of the commodity to be operated and the specific operation content. After the server is informed of the commodity of the operation and the content of the operation. Then, based on the operation content, operate the commodities in the database. For example, the price of commodity A in the database can be queried to learn the modification, or all attribute data of commodity A in the database can be deleted.
  • FIG. 3C is a service scenario interaction diagram of a commodity management system based on the embodiment shown in FIG. 3B .
  • the CP can operate the CP terminal according to requirements, and use the CP terminal to call the server API to trigger the commodity management service of the commodity management system.
  • the commodity management system will associate commodity data based on the actual operation of the CP terminal, and inform the CP terminal of the management result. For example, the operation results of deleting, modifying and querying commodity data.
  • the user can operate the user terminal as required, and input the product text or image into the product management system through the user terminal to trigger the online search service of the product management system.
  • the commodity management system After receiving the commodity text or commodity picture sent by the user terminal, the commodity management system will perform an online commodity search based on the received commodity text or commodity picture, and return the online search result to the user terminal for the user to view.
  • the execution subject of each group of operations may be any one of the multiple servers.
  • the embodiment of the present application does not limit the manner of determining the specific execution subject of each group of operations too much. It can be set by technicians according to actual needs.
  • one server may be selected from the set multiple servers to be responsible for performing operation 1 .
  • the server will perform product data splitting, sub-file uploading, and task creation processing.
  • it can also be set as every time a CP uploads commodity data.
  • a server is randomly selected from multiple servers to perform operation 1. For example, a distributed lock mechanism can be introduced. At this time, each server will synchronously apply to the cache component for distributed locks for commodity data. Operation 1 is performed only by the server that grabbed the distributed lock at a time.
  • a server may be selected from the plurality of servers set to be responsible for performing operation 2. At this time, no matter which CP uploads product data, the server will perform subtasks for query, subfiles for download, verification, and attribute data storage. In other optional embodiments, it may also be configured that when there are subtasks waiting to be executed, multiple servers synchronously query and process the subtasks. At this point, high concurrent processing of subtasks can be achieved. For example, a distributed lock mechanism can be introduced. At this time, each will apply to the cache component for distributed locks for subtasks synchronously. For a single subtask, it can be executed by the server that grabs the corresponding distributed lock.
  • a server may be selected from the plurality of servers set to be responsible for performing operation 3. At this time, the database will send the storage result to the selected server for sending to the CP terminal. In some other optional embodiments, it may also be configured that each time the database obtains a storage result, a server is randomly selected from a plurality of servers, and the storage result is sent to the server.
  • one server may be selected from the set multiple servers to be responsible for performing operation 4.
  • the CP terminal will inform the server of the commodities to be operated each time, as well as the specific operation contents.
  • the online management of commodity data is realized by the server.
  • it may also be set to randomly send the data of the CP terminal to one server among the multiple servers each time, and the server implements the online management of the commodity data.
  • Part 2 The user conducts a product search.
  • This embodiment of the present application will provide a user with a commodity search function. Users can upload product-related text or pictures to the product management system according to their needs. Commodity search and search result feedback are performed by the commodity management system based on the text or pictures uploaded by the user.
  • product search can be divided into two stages: pre-search and in-search, which are detailed as follows:
  • image feature analysis Before searching, it is first necessary to perform image feature analysis on the product image, and obtain image feature data for image search.
  • the operation of image feature analysis may occur in the following two stages:
  • Stage 1 During the verification process of the sub-files in the embodiment shown in FIG. 2A to FIG. 3A , the image feature analysis is performed on the commodity pictures synchronously.
  • Stage 2 After the embodiment shown in FIG. 2A to FIG. 3A completes the storage of the commodity data, perform image feature analysis on the commodity pictures stored in the NSP.
  • technicians can set any one of the above two stages to perform image feature analysis according to requirements.
  • stage 1 For example, if it is set to stage 1 to perform image feature analysis.
  • the server when the server completes the verification of a single sub-file, it will perform image feature analysis on the corresponding commodity pictures for the commodities that have passed the verification. And the obtained image feature data will be stored in the feature library. However, if it is set to stage 2, image feature analysis is performed. Then, after the verification of the sub-file is completed, the server may perform image feature analysis on the commodity pictures of the commodities that have passed the verification. And the obtained image feature data will be stored in the feature library.
  • the server obtains the image of the product, performs image feature analysis on the image of the product, and stores the obtained image feature data in a feature library.
  • the server in this embodiment of the present application may be a server that verifies sub-files. Can also be other servers. The specifics can be set by technical personnel according to requirements, which are not limited here. Correspondingly, depending on the situation of the server and the stage in which the image feature analysis occurs, there may be differences in the way of “acquiring” the product images. For example, when S301 is implemented by a server that performs sub-file verification. Obtaining may refer to the server reading the downloaded product image (refer to S106, at this time, the server has downloaded the product image through the image download address). For the case where the product image is not included locally, the product image needs to be downloaded from the NSP. In this case, the acquisition refers to downloading the product image from the NSP.
  • the embodiments of the present application do not limit the specific image feature analysis method too much, which can be determined by technical personnel according to actual needs.
  • some image feature extraction models based on neural networks or deep learning can be pre-trained for image feature analysis.
  • the data type and content of the image feature data need to be determined according to the specific image feature method.
  • it can be an image feature point and a feature vector describing the feature point information, or an image feature vector, such as a 1024-dimensional floating point image feature vector. It can also be other characteristic data.
  • the characteristics of commodities under each category have certain commonalities. For example, the shapes of commodities in the same category are often similar. Therefore, in order to improve the effect of image feature analysis, the obtained image feature data can better characterize the product.
  • different image feature extraction models may be pre-designed for different categories of commodities. When performing image feature analysis, a corresponding model is selected and analyzed according to the actual category of the product (at this time, the product data needs to include the category of the product).
  • the product categories are divided into 10 categories: clothing, digital home appliances, shoes, luggage, home furnishing, toys, beauty, accessories, food and others.
  • an image feature extraction model can be designed for each of the 10 categories, and 10 corresponding models can be obtained.
  • the image feature extraction model corresponding to the current product is first determined according to the category in the product data. Then, the image feature extraction model is used to analyze the image feature of the current product to obtain image feature data.
  • the embodiment of the present application in order to improve the accuracy of the image search, will simultaneously introduce commodity information as an auxiliary for image feature analysis of commodity pictures. Details are as follows:
  • an image feature analysis model based on a neural network is pre-trained, and commodity data is analyzed based on the image feature analysis model to obtain corresponding image feature data.
  • corresponding image feature analysis models can be set respectively.
  • the training process of the image feature analysis model includes:
  • Feature extraction is performed on commodity information as sample data by using the initial model, and a first loss function for commodity information is calculated according to the extracted text features and corresponding classification labels.
  • the text feature may be a word vector or other text features.
  • the extraction method is not limited here, for example, the product information can be segmented, and the word vector can be obtained by using methods such as text embedding.
  • the text features can be processed and classified by using a fully connected layer, etc., and then the loss function can be calculated based on the classification results and classification labels.
  • the initial model is used to extract the image features of the product images as sample data, and the second loss function for the product images is calculated according to the image features and the corresponding classification labels.
  • the image features can be processed and classified by using the fully connected layer, and then the loss function can be calculated based on the classification results and classification labels.
  • a third loss function is calculated based on the first loss function and the second loss function, and the initial model is iteratively updated according to the calculated value of the third loss function until a preset convergence condition is satisfied, and a trained model is obtained.
  • each network used for feature extraction of commodity images is extracted, and an image feature analysis model composed of these extracted networks is obtained.
  • the specific loss function types of the first loss function, the second loss function, and the third loss function are not limited here, and can be set by technical personnel according to requirements.
  • the first loss function may be an image triplet loss function (Image Triplet Loss) or an image class loss function (Image Class Loss), or may be other loss functions.
  • the second loss function may use a text class loss function (Text Class Loss), or may be other loss functions.
  • the third loss function may be a Kullback-Leibler loss function. Other loss functions are also possible.
  • the training method of the classification model is used to separately process the commodity pictures and commodity information of the sample commodities.
  • the model training of multi-modal fusion is performed. That is, the loss function values of the two dimensions are fused through a new loss function, and the model is iteratively updated based on the loss function value obtained by fusion.
  • each network used for feature extraction of product images is extracted (ie, the network that discards the feature extraction part of product information) to form a new model for image feature analysis.
  • the image feature analysis model trained based on this method can achieve more accurate and reliable extraction of product image features, and the obtained image feature data has a better characterization effect on product images.
  • the image feature data extracted based on this image feature analysis model has a high accuracy rate when performing product image matching.
  • the embodiment of the present application will use the image feature analysis model to analyze each commodity picture, so as to obtain corresponding image feature data.
  • the server acquires the image of the product, performs product detection on the image of the product, and cuts out the image of the product according to the detection result.
  • the server performs image feature analysis on the commodity image, and stores the obtained image feature data in a feature library.
  • the embodiment of the present application does not limit the method of commodity detection (in essence, object recognition), which can be set by technical personnel according to actual needs.
  • some object positioning methods may be used to locate all objects included in the product image.
  • the merchandise is generally placed in front of the camera of the photographing equipment. Therefore, after locating each object, the object occupying the largest pixel area can be identified as a commodity, and the image in the commodity target frame can be intercepted.
  • the size of the captured commodity image cannot be predetermined.
  • the image feature analysis is performed directly on the product image, the situation of the image feature data may be uncontrollable. It is not conducive to subsequent operations such as image feature matching. Therefore, in this embodiment of the present application, the length and width pixels of the product image may be filled before S3012, so that the length and width of the product image are the same. Then scale the product image to a preset size, such as 299 ⁇ 299. Finally, the commodity image obtained by scaling is used as the image feature analysis object of S3012.
  • the image feature data obtained at this time has a relatively controllable amount of data.
  • the operation of commodity detection in S3011 can also be performed manually by a CP or a technician.
  • the CP or the technical staff can manually select the product box in the product picture.
  • the server performs the interception of the commodity image.
  • the number of commodities is often large. Especially when more CPs use the commodity management system, the number of commodities will increase exponentially. Correspondingly, the amount of image feature data obtained by analyzing the product image features will also increase sharply. This makes the data storage pressure of the signature database relatively large. In order to reduce the data storage pressure of the feature library, the cost of the feature library can be reduced. After the image feature data is obtained in this embodiment of the present application, the image feature data is further compressed. Then, the compressed image feature data is stored in the feature library. Wherein, the embodiment of the present application does not limit the compression method of the image feature data too much, which can be set by technical personnel according to requirements. For example, the precision of the image feature data can be reduced to reduce the data volume.
  • the compression method can be any one of the following methods:
  • PCA Principal Component Analysis
  • the floating-point image feature data is converted into binary image feature data.
  • FIG. 4B it is a schematic flowchart of a method for performing image feature analysis on commodity pictures before searching. described as follows:
  • the server obtains product images from product data and adds an API for product image search.
  • the server performs commodity detection on the commodity image, and intercepts the commodity image according to the detection result.
  • the server performs image feature analysis on the commodity image to obtain image feature data.
  • the server performs feature compression on the obtained image feature data, and stores the feature compressed image feature data in a feature library.
  • the commodity management system in the embodiment of the present application will provide the user with a commodity search function.
  • the product search includes text search and image search.
  • the commodity data required to be uploaded by the CP needs to include the commodity picture or the download address of the commodity picture.
  • the user can initiate a commodity search request to the commodity management system through the user terminal, and upload the commodity image or commodity text (ie, commodity-related description text) to be searched.
  • the commodity management system After receiving the commodity image or commodity text to be searched, the commodity management system will search the stored commodity data and determine one or more matching commodities. Then, the attribute data and the product picture of the successfully matched product are returned to the user terminal. displayed by the user terminal.
  • the commodity management system is responsible for the real-time management of commodity information, including the addition, deletion, modification and query of commodities, as well as offline import of commodity data. Specifically, it includes searching for commodity pictures and commodity texts, and storing commodity data (that is, generating commodity information and storing commodity information).
  • the user terminal can directly upload the commodity image or commodity text to be searched to the commodity management system. Commodity search and commodity list results are returned by the commodity management system.
  • the right half of FIG. 4C means that the commodity management system can provide commodity data management support for each e-commerce partner. That is, e-commerce partners can upload product data to the product management system as a CP. Inventory management of commodity data is implemented by the commodity management system using the various embodiments shown in FIGS. 2A to 3A .
  • the commodity distribution service layer is mainly connected to the media portal.
  • the user terminal uploads the commodity image or commodity text to be searched to the commodity distribution service through the media portal.
  • the commodity distribution service manages the commodity pictures or commodity texts uploaded by each user terminal in a unified manner, and sends them to the commodity management system to request commodity matching, so as to realize commodity search. It is also responsible for returning the commodity list generated by the commodity management system to the user terminal. Considering that in practical applications, when the number of users is large, the workload of commodity search may be relatively large.
  • the commodity distribution service can be implemented by a dedicated server. It can also be implemented by randomly selecting a server from multiple servers set in the commodity management system. The details can be set by technical personnel according to actual needs, which is not limited here.
  • the text search process includes:
  • the user terminal sends the commodity text input by the user to the server.
  • the server which is one of the execution bodies of the embodiment of the present application, is a server in the commodity management system responsible for commodity search.
  • the server may be a server pre-selected by a technician in the commodity management system. It may also be a server automatically selected according to certain rules from one or more servers included in the commodity management system. For example, it can be randomly selected. Not too limited here.
  • the user terminal is provided with a commodity search function.
  • a user needs to search for a product, he or she can enable the product search function, and input the text of the product to be searched or upload a corresponding product image.
  • the optional provision methods of the product search function include at least the following:
  • the user terminal is a mobile phone.
  • (a) in FIG. 5B is that the commodity search function is integrated into the local search function of the mobile phone in the form of an input box.
  • the user can turn on this feature when needed.
  • the product search function can be placed on the negative screen of the mobile phone.
  • the product search function is enabled, and the corresponding input box is displayed.
  • the user can enter the product text in the input box, or upload the product image.
  • the mobile phone After acquiring the commodity text or commodity picture input by the user, the mobile phone will upload the commodity text or commodity picture to the server in the commodity management system.
  • (b) in FIG. 5B is the integration of the commodity search function in the web page of the mobile phone in the form of an input box.
  • the user can visit the webpage in the mobile phone browser, and enter the product text in the webpage input box, or upload the product image. After acquiring the product text or product image input by the user, the webpage will upload the product text or product image to the server in the product management system.
  • the text search is described by taking the user inputting the commodity text as an example.
  • the product text refers to the description text related to the product. It can be either a paragraph or some keywords. For example, the name, characteristics or brand of the product.
  • the commodity text is the text entered by the user according to the known situation of the searched commodity. Therefore, the actual content of the product text needs to be determined according to the actual application scenario. For example, in some possible scenarios, it may be keywords such as "shorts", “skirt” or "bread”, or sentences such as "5G full-screen mobile phone, 50 million quad cameras”.
  • the server performs text matching on the commodity information of each commodity in the database according to the commodity text, and filters out the first commodity information of the top n commodities with the highest text matching degree.
  • n is a positive integer.
  • the server After obtaining the commodity text, in this embodiment of the present application, the server will perform text matching on the commodity information of each commodity in the database based on the commodity text. Then, the matching degree between each commodity and the commodity text is obtained.
  • the embodiments of the present application do not limit the text matching method too much, which can be set by technical personnel according to actual needs.
  • text matching methods based on semantic analysis such as some neural network-based semantic matching models, can be used.
  • character-based text matching methods such as Brute Force (BF) algorithm, string matching (Rabin-Karp, RK) algorithm and string search (Knuth-Morris-Pratt, KMP) algorithm.
  • the embodiment of the present application will filter out some commodities with high text matching degree, and use the corresponding commodity information (ie, the first commodity information) as the matching result.
  • the specific number n of product information to be screened is not limited here, and can be set by technical personnel. For example, it can be set to any value from 10 to 20, or from 20 to 100.
  • the number of commodities stored in the database may be extremely large in practical applications.
  • the workload of direct text matching is relatively large.
  • category screening of commodities will be performed first.
  • S402 can be replaced with: S4021-S4022.
  • the server performs category identification of the commodity on the commodity text, and obtains the corresponding first category.
  • the CP will be required to provide category attribute data of the commodity in the commodity data.
  • the category described for each commodity will be recorded in the commodity information stored in the database at this time.
  • the embodiment of the present application does not limit the specific classification rules of the categories too much, which can be preset by the technical personnel and notified to the CP.
  • the server After receiving the commodity text, the server will firstly identify the commodity category, that is, determine which category the commodity that the user needs to search for belongs to.
  • the embodiment of the present application does not limit the specific category identification method, which can be set by the technical personnel.
  • a method of keyword matching can be used. That is, some common keywords under each category are set in advance by the technical staff. These keywords can be recorded in the form of a product noun list. For example, suppose the category contains "clothing”. At this time, you can set some related keywords such as "clothes", “tops", “pants” and “skirts" under the "clothing" category.
  • keyword search is performed on the product text.
  • the category to which the found keyword belongs is taken as the category corresponding to the commodity text (ie, the first category).
  • the server performs text matching of the commodity information for commodities under the first category in the database according to the commodity text, and filters out the first commodity information of at least one commodity with the highest text matching degree.
  • the server After determining the category corresponding to the commodity text, the server will only perform commodity information text matching on the commodities under the category in the database. For example, suppose the category corresponding to the product text is "clothing". At this time, the server will only perform text matching of commodity information on commodities under the category of "clothing" in the database. And get the text matching degree corresponding to these products.
  • the server acquires attribute data in the first commodity information from the database, and acquires the commodity picture associated with the first commodity information from the NSP. Generate a product list according to the acquired attribute data and product pictures, and send the product list to the user terminal.
  • S403 can be refined into S4031-S4033:
  • the server acquires attribute data in the first commodity information from the database.
  • the server acquires the commodity picture associated with the first commodity information from the NSP.
  • the server generates a commodity list according to the acquired attribute data and commodity pictures, and sends the commodity list to the user terminal.
  • the embodiment of the present application further downloads attribute data contained in the commodity information from the database, and acquires commodity pictures corresponding to the commodity information from the NSP. And will send the acquired attribute data and product pictures to the user terminal.
  • the commodity information contains more attribute data of commodities. But in practice, some properties may not be important to the user. For example, suppose that the product information includes the download address of the product image. Because the embodiment of the present application will download the image of the product from the NSP. So the download address is not important to the user. For this reason, in this embodiment of the present application, the downloaded attribute data may be part or all of the attribute data contained in the commodity information.
  • the specific content of the attribute data included can be set by the technical personnel according to the actual needs. For example, it can be set that the attribute data to be downloaded includes: name, price and link of the product. If the product information contains the description of the product, it can also be used as one of the downloaded attribute data.
  • the link may be any one or more of a web page link, an App link, and a quick application link, which is used to jump to a corresponding web page (including an Html5 page), an App page, or a quick application page to display products.
  • a web page link including an Html5 page
  • an App page including an App page
  • a quick application page to display products including an Html5 page
  • the web pages, App pages, and quick application pages to which the links point are collectively referred to as commodity display pages.
  • the embodiment of the present application does not limit the e-commerce platform to which the commodity display page pointed to by the link belongs.
  • CP can set up the chain home of goods according to its cooperation with different e-commerce platforms. Therefore, in practical applications, the product display page pointed to by the link of the product may be one or more product display pages in different e-commerce platforms. On this basis, users can click on the link according to their actual needs to jump to the product display page of different e-commerce platforms.
  • different link priorities can be preset, and the user terminal can automatically adjust to the commodity display interface pointed to by a link with a higher priority.
  • CP1 sells commodity A in both the e-commerce platform A and the e-commerce platform B, that is, there are corresponding commodity display pages in the e-commerce platform A and the e-commerce platform B.
  • e-commerce platform A and e-commerce platform B both have corresponding websites, apps and quick apps.
  • the CP can set the corresponding links of the e-commerce platform A and the e-commerce platform B in the website, app and quick application respectively in the product data. That is, at least 6 links can be set in total.
  • the embodiment of the present application will sort the attribute data and the commodity pictures in a unit of a single commodity. That is, first sort each product according to certain rules, and then sort the attribute data and product pictures according to the order of the products. After the sorting is completed, the attribute data and the product image of a single product are placed in the same row, and the attribute data and product images of different products are in different rows, so as to obtain a product list composed of the sorted attribute data and product images. Then, the product list is returned to the user terminal as the search result of the product text.
  • the user terminal After receiving the commodity list, the user terminal displays the commodity list on the screen. Allows users to see the search results of the product text.
  • the embodiment of the present application does not limit the display manner of the commodity list too much. It can be set by technicians according to their needs.
  • a card may be generated for each commodity in the commodity list, and the attribute data and commodity picture of the commodity in the commodity list may be displayed on the same card. At this time, cards corresponding to each commodity one-to-one can be displayed on the display screen of the user terminal.
  • FIG. 5C Take an example to illustrate.
  • the user enters the product text as "wine glass”.
  • the search result contains 4 products, each product has three attribute data of product name, price and link, and has a corresponding product picture.
  • the embodiment of the present application generates a card for each commodity.
  • the product image and various attribute data will be displayed in the card.
  • each link is displayed in a card in the form of a control.
  • the user terminal jumps to the product display page pointed to by the link.
  • the link is a web page link, it means to start the browser and open the web page used for product display.
  • the link is an App link, it means to start the corresponding App and open the App page for product display from the App.
  • the link is a quick app link, it means to start the corresponding quick app, and open the quick app page from the app to open the product display.
  • FIG. 5D Take an example to illustrate. Reference may be made to FIG. 5D, on the basis of the example shown in FIG. 4C. It is assumed that the user clicks on the web page link 1 in the first commodity card (refer to (a) in FIG. 5D ). At this time, the user terminal will start the browser and open the website page for commodity display (refer to (b) in FIG. 5D ). At this point, the user can learn about the product details in the opened web page, and can make purchases and other operations.
  • the priority between different links may be preset by the technician or the CP.
  • the link itself is not displayed.
  • the search result contains 4 products, each product has three attribute data of product name, price and link, and has a corresponding product picture.
  • the embodiment of the present application generates a card for each commodity.
  • the product image and various attribute data except the link will be displayed in the card.
  • the user terminal On the basis of realizing the display of the commodity list, if the commodity list contains the link of the commodity, and the user clicks the card corresponding to the commodity. Then the user terminal will open the product display page pointed to by the link with the highest priority. If it fails to open, it will try to open the product display page pointed to by the link with the next highest priority. And so on, until a product display page position is successfully opened.
  • the user terminal will start the browser and jump to the corresponding page. Among them, if there is no App, quick application or browser corresponding to the link in the user terminal, the link jump will fail. At this time, in this embodiment of the present application, a link with the next highest priority is reselected for jumping.
  • the user can search for the commodity by inputting the commodity text on the user terminal. And the attribute data of one or more searched commodities can be viewed in the user terminal. You can also view the product display page according to your actual needs. Therefore, the user's product search can be greatly facilitated, and the efficiency of product exposure can be improved.
  • Image search Referring to Figure 6, the process of image search includes:
  • the user terminal uploads the image of the product selected by the user to the server.
  • S501 is basically the same as that of S401. Therefore, for specific operation details, principles and beneficial effects, reference may be made to the relevant description in S401, which will not be repeated here.
  • the user needs to select a local picture from the user terminal as a product picture to upload to the server.
  • a user terminal may be used to take a photo and upload it to the server as a product image.
  • the function entrance of image search can theoretically be embedded in any function with photo taking or image browsing.
  • the picture search function can also be embedded in the camera function of the user terminal. At this time, the user can directly enable the image search function to query the product corresponding to the photographed object after taking pictures of the object in daily life.
  • the image search function can be embedded in the gallery of the user terminal. At this time, while browsing the gallery, the user can enable the image search function as required to query the product corresponding to a certain image in the gallery.
  • technicians can set the function entry of image search in one or more functions of the user terminal according to requirements.
  • the function entry of image search in different functions, on the one hand, it is convenient for users to use image search, and "photograph shopping" can be realized anytime and anywhere. On the other hand, it can increase the exposure of products and bring more traffic to merchants and e-commerce platforms.
  • the server performs image feature analysis on the received commodity picture to obtain first image feature data.
  • the method for analyzing the image features of the product pictures uploaded by the user is the same as the image feature analysis method for the uploaded product pictures before the search. Therefore, for the operation of the image feature analysis, reference may be made to the relevant description in S301, which will not be repeated here.
  • the characteristics of commodities under each category have certain commonalities. For example, the shapes of commodities in the same category are often similar. Therefore, in order to improve the effect of image feature analysis, the obtained image feature data can better characterize the product.
  • different image feature extraction models may be pre-designed for different categories of commodities.
  • category identification also referred to as intent classification
  • image feature analysis is first performed on the image of the product to determine the actual category of the product to be searched. Then use this to select the corresponding model and analysis.
  • the category identification of the commodity pictures is essentially the automatic classification of the commodities. Therefore, a corresponding category classification model can be set in advance for each known commodity category. Then use the category classification model to realize the classification and identification of commodity categories.
  • This embodiment of the present application does not limit the model type and architecture of the category classification model too much. It can be set by technicians according to actual needs.
  • S301 uses the image feature analysis model obtained based on multimodal fusion to perform image feature analysis (for details of the image feature analysis model, please refer to the corresponding embodiment description in S301).
  • the same image feature analysis model as in S301 is used to perform image feature analysis on the received image of the product, thereby obtaining corresponding image feature data (ie, first image feature data).
  • S3011-S3012 can be applied to the embodiments of the present application.
  • S502 can be replaced with:
  • the server performs commodity detection on the received commodity image, and intercepts the commodity image according to the detection result.
  • the server performs image feature analysis on the commodity image, and obtains first image feature data.
  • S5021-S5022 are basically the same as those of S3011-S3012, so the specific operation details, principles and beneficial effects can be referred to the relevant descriptions in S3011-S3012, which will not be repeated here.
  • S503 The server performs feature matching on the image feature data in the feature library according to the first image feature data.
  • the top n second image feature data with the highest feature matching degree are screened from the feature library, and n commodities corresponding to the top n second image feature data respectively are determined.
  • the embodiment of the present application uses the image feature data to perform feature matching on the image feature data stored in the feature database. And screen out the top n image feature data (ie, the second image feature data) with the highest feature matching degree. Then, the products corresponding to these image feature data are used as the target products for this search.
  • the embodiments of the present application do not limit the specific method of feature matching too much, which can be set by technical personnel according to actual needs.
  • some open source search engines can be used to implement feature matching.
  • Faiss can be used, and its principle is to calculate the similarity of image features, and then return the number of products according to the similarity.
  • the number n of image feature data to be specifically screened is not limited here, and can be set by the technical personnel. For example, it can be set to any value from 10 to 20, or from 20 to 100.
  • S504 Obtain product pictures of n products from the NSP, and obtain attribute data of the n products from the database. Generate a product list according to the acquired attribute data and product pictures, and send the product list to the user terminal.
  • S504 can be refined into S5041-S5043:
  • the server obtains commodity pictures of n commodities from the NSP.
  • the server obtains attribute data of n commodities from the database.
  • the server generates a commodity list according to the acquired attribute data and commodity pictures, and sends the commodity list to the user terminal.
  • the embodiment of the present application will download commodity pictures of these commodities from the NSP.
  • the attribute data of n items are downloaded from the database.
  • a product list is generated according to the obtained attribute data and product pictures, and sent to the user terminal.
  • the downloading operation of the attribute data and the generating operation of the commodity list are basically the same as S406.
  • S406 the relevant description of S406, which will not be repeated here.
  • the server may also reorder the product attribute data and product pictures of each product in the product list (that is, reordering). .
  • product color is more important for user experience.
  • the product image in the product list with a similar color to the product image uploaded by the user terminal and the attribute data corresponding to the product image can be prioritized in the front of the product list.
  • the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains first trademark information included in the commodity picture.
  • S602 The server performs trademark detection on each commodity picture in the commodity list, respectively, to obtain second trademark information contained in these commodity pictures.
  • the server uses the first trademark information to perform information matching on each second trademark information, and sorts the attribute data and product pictures of each commodity in the commodity list according to the order of the information matching degree from high to low.
  • the brand information includes at least one of a brand name and a brand pattern.
  • Specific can be set by technical personnel according to actual needs.
  • the embodiments of the present application do not limit the detection method of trademark information too much, which can be set by technical personnel.
  • it can be an image recognition method based on a neural network model, or it can preset some trademark images for image matching.
  • the trademark information contained in the product image is also used to perform secondary matching. And will re-sort the obtained product list according to the secondary matching results.
  • a commodity with a high similarity to the commodity to be retrieved by the user can be preferentially displayed in the user terminal with attribute data and commodity pictures.
  • Another possible implementation of the reordering of the present application includes:
  • the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains the first trademark information included in the commodity picture.
  • the server extracts the third trademark information of n commodities according to the acquired attribute data.
  • the server uses the first trademark information to perform information matching on each third trademark information, and sorts the attribute data and product pictures of each commodity in the commodity list according to the order of the information matching degree from high to low.
  • the brand information (including the first brand information and the third brand information) is the brand name.
  • trademark detection is performed on the commodity picture, and the brand name (ie, the first trademark information) of the trademark contained in the commodity picture is identified.
  • the brand name of each commodity ie, the third brand information
  • a second match is made based on the brand name. And will re-sort the obtained product list according to the secondary matching results.
  • a commodity with a high similarity to the commodity to be retrieved by the user can be preferentially displayed in the user terminal with attribute data and commodity pictures.
  • the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains the first trademark information included in the commodity picture.
  • S608 The server performs trademark detection on each commodity picture in the commodity list, respectively, to obtain second trademark information contained in these commodity pictures.
  • the server extracts the third trademark information of n commodities according to the acquired attribute data.
  • the server uses the first trademark information to perform information matching on the second trademark information and the third trademark information of the n commodities, and according to the order of the information matching degree from high to low, compares the attribute data and the attribute data of each commodity in the commodity list with the third trademark information.
  • Product images are sorted.
  • the first trademark information includes the brand name, and on this basis, the trademark pattern may also be included. If the first trademark information only contains the brand name, the second trademark information is the brand name. If the first trademark information contains both the brand name and the trademark pattern, the second trademark information may contain any one or more of the brand name and the trademark pattern.
  • the third trademark information is the trademark name.
  • trademark detection is performed on the product image, and the first trademark information contained in the product image is identified.
  • the brand name of each product ie, the third brand information
  • the second brand information is identified for the product pictures of the n products.
  • the second matching is performed according to the obtained three types of trademark information, and the obtained commodity list is reordered according to the second matching result.
  • each second trademark information may be matched by using the first trademark information on the one hand to obtain the first matching degree corresponding to the n commodities.
  • each third trademark information is matched by using the first trademark information to obtain the second matching degree corresponding to the n commodities. Then, based on the first matching degree and the second matching degree, the final matching degree of each commodity is determined (may be processed by means of weight summation, etc.), and used as the matching result.
  • a CP error may occur and the attribute data of the same commodity is repeatedly placed in the same commodity data. For example, some tops that differ only in size are placed in the same product data. At this time, a single product may correspond to multiple attribute data at the same time. For example, the same top, only the size is different. If the product has a higher priority, the attribute data of the same product may be repeated in the product list. At this time, the user experience will be degraded.
  • the embodiment of the present application will perform commodity deduplication on the commodity list. That is, for the same product in the product list, only the attribute data and product image of one product are retained. And delete the attribute data and product information of other products. At this time, the deduplication update of the commodity list can be realized, the effectiveness of the commodity list can be improved, and the user experience can be improved.
  • what is displayed in S505 is the list of commodities after deduplication and updating.
  • the user terminal displays the commodity list.
  • S505 is basically the same as that of S404. Therefore, for specific operation details, principles and beneficial effects, reference may be made to the relevant description in S404, which will not be repeated here.
  • the display of the image search results (that is, the product list), and the way of responding to the user clicking on the link.
  • the input data needs to be changed from the commodity text "red wine glass” to a picture of a red wine glass.
  • FIG. 4B it is a schematic flowchart of a method for performing image search on a product image uploaded by a user in a search. described as follows:
  • the user terminal uploads the product image to the server through the API.
  • the server performs commodity detection on the commodity image, and intercepts the commodity image according to the detection result.
  • the server performs category recognition on the commodity image to obtain the second category.
  • the server Based on the second category, the server performs image feature analysis on the commodity image to obtain first image feature data.
  • the server performs data compression on the first image feature data to obtain compressed first image feature data.
  • the server performs feature matching on the image feature data in the feature library based on the first image feature data to obtain a product list.
  • the server reorders the commodity list to obtain the sorted commodity list.
  • Commodity deduplication is performed on the sorted commodity list, and the commodity list after commodity deduplication operation is sent to the user terminal.
  • the commodity management system has both text search and image search functions.
  • CP stores commodity data once, and can realize various commodity distribution channels. It provides a more method product search function for the e-commerce platform, and has high practical value.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • first, second, third, etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance. It will also be understood that, although the terms “first,” “second,” etc. are used in the text to describe various elements in some embodiments of the present application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • a first table could be named a second table, and similarly, a second table could be named a first table, without departing from the scope of the various described embodiments.
  • the first table and the second table are both tables, but they are not the same table.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • An embodiment of the present application further provides a server, the server includes at least one memory, at least one processor, and a computer program stored in the at least one memory and executable on the at least one processor, the processing When the computer executes the computer program, the server is made to implement the steps in any of the foregoing method embodiments.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a server, the server can implement the steps in each of the above method embodiments when executed.
  • An embodiment of the present application further provides a chip system, the chip system includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement the steps in the foregoing method embodiments .
  • the integrated modules/units if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-Only Memory (ROM) ), random access memory (Random Access Memory, RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Abstract

The present application provides a commodity data management method and apparatus, and a server, which are applicable to the technical field of data management. Said method comprises: acquiring commodity data, and splitting the commodity data into at least one first sub-file, each first sub-file comprising attribute data of at least one commodity; and performing attribute data verification on each first sub-file, and storing the attribute data passing the verification into a database. The embodiments of the present application greatly reduce the technical threshold of commodity data management operations and have higher availability. Furthermore, automatic verification and data storage of the commodity data also greatly increases the management efficiency of commodity data.

Description

商品数据管理方法、装置及服务器Product data management method, device and server
本申请要求于2020年10月23日提交国家知识产权局、申请号为202011152745.1、申请名称为“商品数据管理方法、装置及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011152745.1 and the application name "Commodity data management method, device and server" submitted to the State Intellectual Property Office on October 23, 2020, the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本申请属于数据管理技术领域,尤其涉及商品数据管理方法、装置及服务器。The present application belongs to the technical field of data management, and in particular relates to a commodity data management method, device and server.
背景技术Background technique
电商已经成为了大众购物的一种主流手段。通过内容提供商(Content Provider,CP。CP可以是商家,也可以是商家以外的其他人员。)向商品管理系统提供商品数据,用户利用商品管理系统搜索商品的模式。可以实现对商品的快速曝光,进而为商家带来巨大的流量。E-commerce has become a mainstream means of mass shopping. Provide commodity data to the commodity management system through a content provider (Content Provider, CP. The CP can be a merchant or someone other than the merchant.) A mode in which users use the commodity management system to search for commodities. It can realize the rapid exposure of the products, and then bring huge traffic to the merchants.
为了使得用户可以搜索商品,现有的商品管理系统需要CP提供结构化的商品数据。再由商品管理系统对这些结构化的商品数据进行存储,并提供商品搜索服务。但这种方式需要CP具备一定的数据操作能力,从而使得此过程的操作门槛较高。同时对于大多数CP而言,商品的数量往往较多。为了实现对这些商品结构化的商品数据提供,需要CP耗费大量的人力物力进行操作,因此使得商品数据提供的工作量往往较大。综上,现有技术对商品数据的管理效率较低,CP操作的门槛较高,不利于对商品数据的有效管理。In order to allow users to search for commodities, the existing commodity management system requires the CP to provide structured commodity data. The commodity management system stores these structured commodity data and provides commodity search services. However, this method requires the CP to have a certain data operation capability, which makes the operation threshold for this process relatively high. At the same time, for most CPs, the quantity of goods tends to be larger. In order to provide structured commodity data for these commodities, the CP needs to spend a lot of manpower and material resources to operate, so the workload of commodity data provision is often large. To sum up, the prior art has low management efficiency for commodity data, and the threshold for CP operation is relatively high, which is not conducive to the effective management of commodity data.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请实施例提供了商品数据管理方法、装置及服务器,可以解决现有技术中对商品数据管理效率较低的问题。In view of this, the embodiments of the present application provide a commodity data management method, device, and server, which can solve the problem of low efficiency in commodity data management in the prior art.
本申请实施例的第一方面提供了一种商品数据管理方法,应用于服务器,包括:A first aspect of the embodiments of the present application provides a commodity data management method, applied to a server, including:
获取商品数据,并将商品数据拆分为至少一个第一子文件,其中每个第一子文件中包含至少一个商品的属性数据。The commodity data is acquired, and the commodity data is divided into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
对各个第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库。Perform attribute data verification on each of the first sub-files, and store the verified attribute data in the database.
本申请实施例的数据管理过程中,CP只需按照格式要求提供商品数据,即可实现对商品数据的离线导入数据库。由于实际应用中CP原本就需要整理商品数据(无论是出于库存整理还是上架电商平台等目的,实际应用中CP一般都是需要整理商品数据的),因此对CP而言,只需要将商品数据按照格式要求整理即可,无需付出过多额外的工作。商品管理系统在接收到商品数据之后,会对商品数据进行数据拆分,得到多个子文件(即第一子文件)。并会对各个子文件分别进行属性数据校验,以及校验通过的属性数据的上传。其中,服务器对各个子文件的校验,可以是串行处理或并行处理。当并行处理时,服务器可对多个子文件同时进行校验操作,从而提高校验的效率。In the data management process of the embodiment of the present application, the CP only needs to provide the commodity data according to the format requirements, and then the offline import of the commodity data into the database can be realized. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work. After receiving the commodity data, the commodity management system splits the commodity data to obtain multiple sub-files (ie, the first sub-file). Attribute data verification will be performed on each sub-file, and the attribute data that has passed the verification will be uploaded. The verification of each sub-file by the server may be serial processing or parallel processing. During parallel processing, the server can perform verification operations on multiple sub-files at the same time, thereby improving verification efficiency.
相对现有技术而言,本申请实施例大大降低了CP操作的技术门槛,可用性更高。 同时对商品数据自动化的校验和数据存储,也极大地提升了对商品数据的管理效率。Compared with the prior art, the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability. At the same time, the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data.
在第一方面的第一种可能的实现方式中,对各个第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库,包括:In a first possible implementation manner of the first aspect, attribute data verification is performed on each of the first sub-files, and the verified attribute data is stored in a database, including:
从至少一个第一子文件中选取出一个子文件作为第二子文件。One sub-file is selected from the at least one first sub-file as the second sub-file.
对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库。Perform attribute data verification on the second sub-file, and upload the verified attribute data in the second sub-file to the database.
在完成对第二子文件的校验之后,返回执行从至少一个第一子文件中选取出一个子文件作为第二子文件的操作,直至所有第一子文件均被校验完成。After the verification of the second sub-file is completed, the operation of selecting a sub-file from the at least one first sub-file as the second sub-file is returned to execute until all the first sub-files are verified.
本申请实施例中,服务器会循环从这些第一子文件中选取各个子文件(即第二子文件)并进行处理,实现对各个第一子任务进行数据校验,并同步将子文件内的商品数据存储至数据库。使得入库的效率更高,实现了对商品数据的高效管理。In the embodiment of the present application, the server will cyclically select and process each subfile (ie, the second subfile) from these first subfiles, so as to perform data verification on each first subtask, and synchronously convert the data in the subfiles The commodity data is stored in the database. This makes warehousing more efficient and realizes efficient management of commodity data.
在第一方面第一种可能实现方式的基础上,作为第一方面的第二种可能的实现方式,在对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库之前,还包括:Based on the first possible implementation manner of the first aspect, as a second possible implementation manner of the first aspect, attribute data verification is performed on the second sub-file, and the verified data in the second sub-file is verified. Before the attribute data is uploaded to the database, it also includes:
在数据库中创建与至少一个第一子文件一一对应的第一子任务。A first subtask corresponding to the at least one first subfile one-to-one is created in the database.
对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库,包括:Perform attribute data verification on the second sub-file, and upload the verified attribute data in the second sub-file to the database, including:
从数据库存储的第一子任务中确定出一个第二子任务,并从至少一个第一子文件中获取与第二子任务关联的第二子文件。A second subtask is determined from the first subtasks stored in the database, and a second subfile associated with the second subtask is acquired from at least one first subfile.
返回执行从至少一个第一子文件中选取出一个子文件作为第二子文件的操作,直至对所有第一子文件均校验完成,包括:返回执行从数据库存储的第一子任务中确定出一个第二子任务的操作,直至所有第一子任务均被执行完成。Returning and executing the operation of selecting one sub-file from at least one first sub-file as the second sub-file until all the first sub-files are verified, including: returning and executing the operation determined from the first sub-task stored in the database The operation of a second subtask until all the first subtasks have been executed.
在本申请实施例中,通过在数据库中创建与子文件一一对应的子任务(即第一子任务),并以确定所需执行的子任务(即第二子任务)的形式实现对各个子文件的选取。从而使得本申请实施例中,数据库可以有效记录各个子文件的校验情况,同时服务器也可以方便确定出每次所需校验的子文件。In this embodiment of the present application, by creating subtasks (that is, the first subtasks) that correspond to the subfiles one-to-one in the database, and determining the subtasks to be executed (that is, the second subtasks), the Subfile selection. Therefore, in the embodiment of the present application, the database can effectively record the verification status of each sub-file, and at the same time, the server can also conveniently determine the sub-file to be verified each time.
在第一方面第二种可能实现方式的基础上,作为第一方面的第三种可能的实现方式,从数据库存储的第一子任务中确定出一个第二子任务,包括:Based on the second possible implementation manner of the first aspect, as a third possible implementation manner of the first aspect, a second subtask is determined from the first subtask stored in the database, including:
获取第一子任务中待执行的子任务,待执行的子任务包括未执行的第一子任务,以及执行中且执行时长超出时长阈值的第一子任务。The subtasks to be executed in the first subtask are acquired, and the subtasks to be executed include the first subtask that is not executed, and the first subtask that is being executed and whose execution duration exceeds the duration threshold.
从待执行的子任务中确定出第二子任务。A second subtask is determined from the subtasks to be executed.
在本申请实施例中,会以子任务的执行状态为依据,来区分子任务是否为待执行子任务。考虑到实际应用中,一方面,未执行的子任务需要服务器进行处理。另一方面,实际应用中可能会存在服务器异常无法正常处理子任务的情况,例如服务器由于宕机等原因导致无法正常处理子任务。此时子任务虽然处于执行中,但已经无法执行完成。即使继续等待服务器,也无法完成对子任务的处理,无法实现对子文件的校验。因此需要由其他服务器重新处理这些子任务。基于上述两方面的考量。本申请实施例将未执行的子任务,以及执行中但执行时长超时的子任务,均视为待执行子任务。在此基础上,服务器会获取所有实时所有待执行子任务,并从中确定出所需执行的第二 子任务。In this embodiment of the present application, whether the subtask is a subtask to be executed is determined based on the execution state of the subtask. Considering practical applications, on the one hand, unexecuted subtasks need to be processed by the server. On the other hand, in practical applications, there may be cases where the server cannot process subtasks normally, for example, the server cannot process subtasks normally due to reasons such as downtime. At this time, although the subtask is being executed, it cannot be completed. Even if you continue to wait for the server, the processing of the subtasks cannot be completed, and the verification of the subfiles cannot be realized. Therefore these subtasks need to be reprocessed by other servers. Based on the above two considerations. In this embodiment of the present application, unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed. On this basis, the server will obtain all real-time subtasks to be executed, and determine the second subtask to be executed from them.
在第一方面第三种可能实现方式的基础上,作为第一方面的第四种可能的实现方式,从待执行的子任务中确定出第二子任务,包括:On the basis of the third possible implementation manner of the first aspect, as the fourth possible implementation manner of the first aspect, a second subtask is determined from the subtasks to be executed, including:
依次向缓存组件请求对各个待执行的子任务的分布式锁。Distributed locks for each to-be-executed subtask are sequentially requested from the cache component.
若请求到对单个待执行的子任务的分布式锁,将该待执行的子任务作为第二子任务。If a distributed lock for a single subtask to be executed is requested, the subtask to be executed is regarded as the second subtask.
作为本申请的一个可选实施例,为了提高对子任务的处理效率,可以采用多个服务器同时对各个子任务进行处理。此时作为第一方面各个方案的执行主体的服务器,也是对子任务进行处理的一个服务器。实际应用中发现,可能会出现多个服务器同时选取同一子任务进行处理的情况。此时会导致对商品数据的处理效率降低。为了防止单个子任务同时被多个服务器重复处理,在本申请实施例中,服务器在接收到待执行子任务之后,首先会从中选取出一个子任务,并尝试向缓存组件申请对该子任务的分布式锁。由于单个子任务的分布式锁仅能分配给单个服务器。因此若该子任务未被其他服务器处理,理论上此时可以获取到对该子任务的分布式锁。反之若该子任务已经被其他服务器处理,基于执行前需要申请分布式锁的原则。此时缓存组件内会记录该子任务已被其他服务器申请分布式锁。因此此时会无法成功获取子任务的分布式锁。基于这一原理,在获取到分布式锁完成上锁的操作后,服务器会判定该子任务为此次所需执行的子任务。并会下载对应的子文件。反之,若获取分布式锁失败,则会重新执行子任务选取的操作,以重新选取适宜的子任务。As an optional embodiment of the present application, in order to improve the processing efficiency of subtasks, multiple servers may be used to process each subtask at the same time. At this time, the server that is the execution body of each solution in the first aspect is also a server that processes subtasks. In practical applications, it is found that multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced. In order to prevent a single subtask from being repeatedly processed by multiple servers at the same time, in this embodiment of the present application, after receiving the subtask to be executed, the server first selects a subtask from it, and tries to apply for the subtask to the cache component. Distributed lock. Since the distributed lock of a single subtask can only be assigned to a single server. Therefore, if the subtask is not processed by other servers, in theory, the distributed lock for the subtask can be obtained at this time. On the other hand, if the subtask has been processed by other servers, it is based on the principle of applying for a distributed lock before execution. At this time, the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time. Based on this principle, after obtaining the distributed lock and completing the locking operation, the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. Conversely, if the acquisition of the distributed lock fails, the operation of subtask selection will be re-executed to reselect an appropriate subtask.
在第一方面第四种可能实现方式的基础上,作为第一方面的第五种可能的实现方式,在对第二子文件进行属性数据校验的过程中,还包括:On the basis of the fourth possible implementation manner of the first aspect, as the fifth possible implementation manner of the first aspect, the process of performing attribute data verification on the second sub-file further includes:
判断对第二子文件的校验时长是否达到时长阈值。It is judged whether the verification duration of the second sub-file reaches the duration threshold.
若对第二子文件的校验时长达到时长阈值,则释放对第二子文件的分布式锁。If the verification duration on the second subfile reaches the duration threshold, the distributed lock on the second subfile is released.
为了防止服务器出现故障,导致子任务被自身长时间占据,使得对子任务执行效率降低。服务器会自行统计对子任务的校验时长,并判断是否达到时长阈值。若达到,则说明自身对子任务校验超时,可能是自身出现故障。因此此时会释放对子文件的分布式锁,从而使得其他服务器可以执行该子任务。实现了对子任务的自动节点接管。进而使得对子任务执行的可靠性大大增强。In order to prevent the failure of the server, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask. The server will count the verification duration of subtasks by itself, and determine whether the duration threshold is reached. If it is reached, it means that the verification of the subtasks by itself has timed out, and it may be that it is faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
在第一方面第一种至第五种可能实现方式的基础上,作为第一方面的第六种可能的实现方式,若商品数据中存在校验失败的属性数据,则获取校验失败的属性数据的异常信息,并将异常信息存储至数据库。Based on the first to fifth possible implementations of the first aspect, as the sixth possible implementation of the first aspect, if there is attribute data that fails to be verified in the commodity data, the attribute of the failed verification is acquired The abnormal information of the data, and the abnormal information is stored in the database.
本申请实施例会对解析出的属性数据进行合法性校验。即判断各个属性数据是否存在数据缺失或者数据错误等问题。当存在这些问题时,说明对这些属性数据校验失败。此时本申请实施例会将数据解析异常对应的异常信息上传至数据库。由数据对异常信息进行记录。在此基础上,可以将这些异常信息反馈至CP,或者由CP自行查询。从而使得CP可以快速或者哪些数据存在问题,并可以进行针对性的检查和重新入库。进而提高了对商品数据入库的效率。In this embodiment of the present application, the parsed attribute data is checked for validity. That is, it is determined whether each attribute data has problems such as missing data or data errors. When these problems exist, it means that the verification of these attribute data fails. At this time, the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data. On this basis, the abnormal information can be fed back to the CP, or the CP can query it by itself. Thus, the CP can quickly or which data has problems, and can perform targeted inspection and restocking. Thus, the efficiency of warehousing the commodity data is improved.
在第一方面第一种至第六种可能实现方式的基础上,作为第一方面的第七种可能的实现方式,商品数据为数据表格式的数据。Based on the first to sixth possible implementation manners of the first aspect, as a seventh possible implementation manner of the first aspect, the commodity data is data in a data table format.
在本申请实施例中,设定商品数据的格式为数据表格式。由于数据表格式是一种常用的数据记录格式,而是许多CP日常整理商品数据时使用的格式。因此对于CP而言,若要求商品数据以数据表格式提供,CP仅需对原本的商品数据进行简单的整理,即可得到商品管理系统所需的商品数据。从而使得对CP的技术门槛以及工作量要求大幅度降低,进而提高了对商品数据管理的效率。In the embodiment of the present application, the format of the commodity data is set as a data table format. Since the data table format is a common data recording format, it is the format that many CPs use when they organize commodity data on a daily basis. Therefore, for the CP, if the commodity data is required to be provided in a data table format, the CP only needs to simply organize the original commodity data to obtain the commodity data required by the commodity management system. As a result, the technical threshold and workload requirements for CP are greatly reduced, thereby improving the efficiency of commodity data management.
在第一方面第一种至第七种可能实现方式的基础上,作为第一方面的第八种可能的实现方式,将第二子文件中的属性数据上传至数据库,包括:Based on the first to seventh possible implementation manners of the first aspect, as an eighth possible implementation manner of the first aspect, uploading the attribute data in the second sub-file to the database includes:
在对第二子文件进行属性数据校验的过程中,将第二子文件中校验通过的属性数据上传至数据库。或者,在对第二子文件进行属性数据校验完成后,将第二子文件中校验通过的属性数据上传至数据库。In the process of performing attribute data verification on the second sub-file, the attribute data that has passed the verification in the second sub-file is uploaded to the database. Alternatively, after the attribute data verification of the second sub-file is completed, the attribute data that has passed the verification in the second sub-file is uploaded to the database.
在本申请实施例中,提供了两种商品属性数据校验入库的方案:In the embodiment of the present application, two schemes for checking and storing commodity attribute data are provided:
方案1:对单个子文件边校验边进行商品属性数据入库,且每次均是以单个商品的属性数据为对象进行校验和入库。(对应于图2A所示实施例)Option 1: Validate a single sub-file while checking the product attribute data into the warehouse, and each time the attribute data of a single product is used as the object for verification and storage. (corresponding to the embodiment shown in FIG. 2A )
方案2:对单个子文件属性数据全部校验完成之后才进行商品属性数据的入库。(对应于图3A所示实施例)Option 2: Only after all the attribute data of a single sub-file is verified can the commodity attribute data be put into storage. (corresponding to the embodiment shown in FIG. 3A )
两个方案有益效果差异如下:The differences in the beneficial effects of the two programs are as follows:
方案1的操作精细度为单个商品级别,而方案2的操作精细度是单个子文件级别。The operation granularity of scheme 1 is the single item level, while the operation granularity of scheme 2 is the single sub-file level.
方案1对单个子文件的校验过程中,服务器需要多次与数据库进行数据交互。需要耗费较多的网络资源,且对服务器与数据库之间的网络连接质量要求较高。During the verification process of a single sub-file in Scheme 1, the server needs to interact with the database multiple times. It needs to consume more network resources, and has higher requirements on the quality of the network connection between the server and the database.
方案1而言在对子文件校验的过程中,数据库内也会同步存储对子文件内各个商品的属性数据或者异常信息。若服务器异常,此时数据库亦可以记录服务器异常之前,对当前子文件内所有校验过的商品属性数据。在此基础上。其他服务器在对该子文件重新进行校验时,可以选择从头开始校验,亦可以选择继续对该子文件内尚未入库的商品属性数据进行校验。因此方案1容错机制更为完善,容错率高。For the solution 1, during the verification process of the sub-file, the attribute data or abnormal information of each commodity in the sub-file is also stored in the database synchronously. If the server is abnormal, the database can also record all the verified commodity attribute data in the current sub-file before the server is abnormal. on the basis of. When the other servers re-verify the sub-file, they can choose to start the verification from the beginning, or they can choose to continue to verify the commodity attribute data in the sub-file that has not yet been put into storage. Therefore, the fault tolerance mechanism of Scheme 1 is more complete and the fault tolerance rate is high.
方案2,服务器若出现异常情况导致无法对当前子文件继续进行校验,会使得数据库无法获取到当前子文件内的属性数据。因此其他服务器需要重新对该子文件进行完整的校验。综上,对于服务器异常的情况,相对方案2,方案1理论上可以减少对子文件重复校验的概率,进而减少对子文件校验的工作量,实现对服务器异常情况的有效应对。In solution 2, if the server fails to continue verifying the current sub-file due to an abnormal condition, the database cannot obtain the attribute data in the current sub-file. Therefore, other servers need to re-check the subfile completely. To sum up, for the abnormal situation of the server, compared with Scheme 2, Scheme 1 can theoretically reduce the probability of repeated verification of sub-files, thereby reducing the workload of verifying sub-files, and effectively responding to abnormal situations of the server.
在第一方面第一种至第八种可能实现方式的基础上,作为第一方面的第九种可能的实现方式,商品数据内的属性数据中,包含商品图片下载地址,方法还包括:Based on the first to eighth possible implementation manners of the first aspect, as a ninth possible implementation manner of the first aspect, the attribute data in the commodity data includes a commodity image download address, and the method further includes:
根据校验通过的属性数据中包含的商品图片下载地址,下载商品图片。Download the product image according to the product image download address included in the verified attribute data.
对商品图片进行图像特征分析,得到图像特征数据。Perform image feature analysis on commodity pictures to obtain image feature data.
将图像特征数据存储至特征库。Store image feature data in a feature library.
本申请实施例会对入库商品进行图像特征分析并存储至特征库,以供后续用户商品搜索时使用。因此本申请实施例可以为后续商品搜索提供数据支持。In this embodiment of the present application, image feature analysis of the in-warehouse commodity is performed and stored in the feature database for use in subsequent user commodity search. Therefore, the embodiments of the present application can provide data support for subsequent commodity searches.
作为本申请的一个实施例,对商品图片进行图像特征分析,得到图像特征数据,包括:As an embodiment of the present application, image feature analysis is performed on a commodity picture to obtain image feature data, including:
接收用户终端上传的第一商品图片。Receive the first commodity picture uploaded by the user terminal.
对第一商品图片进行图像特征分析,得到第一图像特征数据。Perform image feature analysis on the first commodity picture to obtain first image feature data.
对商品图像进行图像特征分析得到图像特征数据。Perform image feature analysis on the product image to obtain image feature data.
考虑到实际应用中CP在拍摄商品图片时,很大概率会拍摄到一些商品以外的物体。此时商品图片中可能会包含多个物体。因此若直接对商品图片进行特征分析,得到的是同时包含其他物体的图像特征数据,不利于后续的图像匹配。因此本申请实施例会在图像特征分析之前,先对商品图片进行商品检测。再根据检测结果来进行商品图像截取和分析,从而使得本申请实施例提取出的商品特征数据与商品本身更为符合,数据更为准确可靠。进而提高后续商品搜索时的准确性和可靠性。Considering that in practical applications, when CP takes pictures of commodities, there is a high probability that some objects other than commodities will be photographed. At this point, the product image may contain multiple objects. Therefore, if the feature analysis is performed directly on the product image, the obtained image feature data also contains other objects, which is not conducive to subsequent image matching. Therefore, in this embodiment of the present application, commodity detection is performed on commodity pictures before image feature analysis. The commodity image is then intercepted and analyzed according to the detection result, so that the commodity characteristic data extracted in the embodiment of the present application is more consistent with the commodity itself, and the data is more accurate and reliable. This further improves the accuracy and reliability of subsequent product searches.
本申请实施例的第二方面提供了一种商品搜索方法,应用于服务器,方法包括:A second aspect of the embodiments of the present application provides a commodity search method, which is applied to a server, and the method includes:
获取第一商品图片的第一图像特征数据,第一商品图片为用户终端上传的图片。Obtain first image feature data of a first commodity picture, where the first commodity picture is a picture uploaded by a user terminal.
从特征库存储的图像特征数据中,确定出与第一图像特征数据特征匹配度最高的至少一个第二图像特征数据。From the image feature data stored in the feature library, at least one second image feature data with the highest feature matching degree with the first image feature data is determined.
将与至少一个第二图像特征数据一一对应的第二商品图片,以及与第二商品图片关联的属性数据发送至用户终端,其中,发送的第二商品图片及关联的属性数据,是基于第一商品图片内包含的商标信息进行排序后的第二商品图片及属性数据。Send a second product image corresponding to at least one second image feature data one-to-one and attribute data associated with the second product image to the user terminal, wherein the sent second product image and associated attribute data are based on the The second product image and attribute data after sorting the trademark information contained in the first product image.
在本申请实施例中,通过对用户上传的商品图片进行特征匹配的方式,可以实现对已入库商品的准确快速搜索。同时对检索出的商品数据,按照商品图片中的商标信息进行重排序再反馈给用户,从而使得与用户待检索的商品相似度较高的商品,可以在用户终端中进行属性数据和商品图片的优先展示。提高检索结果的准确性和关联性。In the embodiment of the present application, by performing feature matching on the commodity pictures uploaded by the user, an accurate and fast search for the commodities already in the warehouse can be realized. At the same time, the retrieved product data is reordered according to the trademark information in the product image and then fed back to the user, so that the product with a high similarity to the product to be retrieved by the user can be used for attribute data and product images in the user terminal. Show priority. Improve the accuracy and relevance of search results.
在第二方面第一种可能实现方式的基础上,作为第二方面的第二种可能的实现方式,在将与至少一个第二图像特征数据一一对应的第二商品图片,以及与第二商品图片关联的属性数据发送至用户终端之前,还包括:On the basis of the first possible implementation manner of the second aspect, as a second possible implementation manner of the second aspect, in the second product image that is one-to-one corresponding to the at least one second image feature data, and the second product image corresponding to the second image feature data Before the attribute data associated with the product image is sent to the user terminal, it also includes:
获取第一商品图片内包含的第一商标信息。Obtain the first trademark information contained in the first product image.
获取各个目标商品的目标商标信息,目标商品是第二图像特征数据所关联的商品,第二商品图片及关联的属性数据,是目标商品的商品图片和属性数据。Target trademark information of each target product is acquired, where the target product is the product associated with the second image feature data, and the second product image and associated attribute data are the product image and attribute data of the target product.
按照目标商标信息与第一商标信息的信息匹配度从高到低的顺序,对目标商品的第二商品图片和属性数据进行排序。Sort the second product pictures and attribute data of the target product in descending order of the information matching degree between the target brand information and the first brand information.
本申请实施例在匹配出相关度较高的多个目标商品之后,会将用户上传的商品图片中的商标信息,与各个目标商品的商标信息进行匹配。再按照匹配度的高低依次进行排序。从而实现对目标商品基于商标信息的重排序。In this embodiment of the present application, after matching multiple target commodities with a high degree of correlation, the trademark information in the commodity pictures uploaded by the user is matched with the trademark information of each target commodity. Then sort them in order of matching degree. Thereby, the reordering of target commodities based on trademark information is realized.
在第二方面第二种可能实现方式的基础上,作为第二方面的第三种可能的实现方式,目标商标信息,包括:第二商标信息和/或第三商标信息。Based on the second possible implementation manner of the second aspect, as a third possible implementation manner of the second aspect, the target trademark information includes: the second trademark information and/or the third trademark information.
第二商标信息是目标商品关联的第二商品图片内包含的商标信息。The second trademark information is the trademark information contained in the second product image associated with the target product.
第三商标信息是目标商品关联的属性数据内包含的商标信息。The third brand information is brand information contained in attribute data associated with the target product.
在本申请实施例中,目标商品的商标信息可以是其商品图片内包含的商标信息,或者可以是其属性数据中记录的商标信息。亦可以是同时包含两者。因此本申请实施例可以适应各种不同的实际情况来获取目标商品的商标信息,以保障商标信息匹配的可靠性。另外,当同时包含两者时,可以提高对目标商品商标信息获取的几率。In this embodiment of the present application, the trademark information of the target commodity may be the trademark information contained in the commodity picture thereof, or may be the trademark information recorded in the attribute data thereof. It is also possible to include both. Therefore, the embodiment of the present application can adapt to various actual situations to obtain the trademark information of the target commodity, so as to ensure the reliability of the matching of the trademark information. In addition, when both are included at the same time, the probability of obtaining the trademark information of the target product can be improved.
在第二方面第一种至第三种可能实现方式的基础上,作为第二方面的第四种可能 的实现方式,对第一商品图片进行图像特征分析,得到第一图像特征数据,包括:On the basis of the first to third possible implementations of the second aspect, as the fourth possible implementation of the second aspect, image feature analysis is performed on the first commodity picture to obtain the first image feature data, including:
利用预先训练完成的图像特征分析模型对第一商品图片进行图像特征分析,得到第一图像特征数据。图像特征分析模型是从基于多个商品样本的商品图片样本和属性数据样本训练得到的神经网络模型中,提取出的模型。Perform image feature analysis on the first commodity picture by using the image feature analysis model completed in advance to obtain first image feature data. The image feature analysis model is a model extracted from a neural network model trained on commodity image samples and attribute data samples based on multiple commodity samples.
在本申请实施例中,利用基于商品图片和属性数据两个维度的数据进行训练后得到的图像特征分析模型来进行图像特征分析,可以提高特征分析的准确性。提高后续对特征匹配的可靠性。In the embodiment of the present application, image feature analysis is performed by using an image feature analysis model obtained after training based on data in two dimensions of commodity pictures and attribute data, which can improve the accuracy of feature analysis. Improve the reliability of subsequent feature matching.
作为对图像特征分析模型进行训练的一种实施例,包括:As an embodiment of training the image feature analysis model, it includes:
预先设置一个初始模型。Preset an initial model.
获取多个样本商品的商品图片和对应的商品信息,将这些商品图片和商品信息作为样本数据,并为每张商品图片和每个商品信息添加对应样本商品的分类标签。Obtain product pictures and corresponding product information of multiple sample products, use these product pictures and product information as sample data, and add a classification label corresponding to the sample product to each product picture and each product information.
利用初始模型对作为样本数据的商品信息进行特征提取,并根据提取出的文本特征和对应的分类标签,计算对商品信息的第一损失函数。Feature extraction is performed on commodity information as sample data by using the initial model, and a first loss function for commodity information is calculated according to the extracted text features and corresponding classification labels.
利用初始模型提取作为样本数据的商品图片的图像特征,并根据图像特征和对应的分类标签,计算对商品图片的第二损失函数。The initial model is used to extract the image features of the product images as sample data, and the second loss function for the product images is calculated according to the image features and the corresponding classification labels.
基于第一损失函数和第二损失函数计算第三损失函数,并根据计算出的第三损失函数值迭代更新初始模型,直至满足预设收敛条件,得到训练完成的模型。A third loss function is calculated based on the first loss function and the second loss function, and the initial model is iteratively updated according to the calculated value of the third loss function until a preset convergence condition is satisfied, and a trained model is obtained.
将训练完成的模型中,用于商品图片特征提取的各个网络提取出来,并得到由这些提取出的网络构成的图像特征分析模型。From the trained model, each network used for feature extraction of commodity images is extracted, and an image feature analysis model composed of these extracted networks is obtained.
在本申请实施例中,采用分类模型的训练方式,分别对样本商品的商品图片和商品信息进行处理。在得到两个维度的损失函数之后,再进行多模融合的模型训练。即将两个维度的损失函数值通过一个新的损失函数进行融合,并基于融合得到的损失函数值来进行模型的迭代更新。最后将训练完成的模型中,用于商品图片特征提取的各个网络提取出来(即舍弃商品信息特征提取部分的网络),组成一个新的用于图像特征分析的模型。实践证明,基于这一方法训练出图像特征分析模型,可以实现对商品图片特征更准确可靠的提取,得到的图像特征数据对商品图片具有较好的表征作用。基于这个图像特征分析模型提取出的图像特征数据,在进行商品图片匹配时,准确率较高。In the embodiment of the present application, the training method of the classification model is used to separately process the commodity pictures and commodity information of the sample commodities. After the loss function of two dimensions is obtained, the model training of multi-modal fusion is performed. That is, the loss function values of the two dimensions are fused through a new loss function, and the model is iteratively updated based on the loss function value obtained by fusion. Finally, from the trained model, each network used for feature extraction of product images is extracted (ie, the network that discards the feature extraction part of product information) to form a new model for image feature analysis. Practice has proved that the image feature analysis model trained based on this method can achieve more accurate and reliable extraction of product image features, and the obtained image feature data has a better characterization effect on product images. The image feature data extracted based on this image feature analysis model has a high accuracy rate when performing product image matching.
本申请实施例的第三方面提供了一种商品数据管理系统,包括:第一服务器、第二服务器和数据库。A third aspect of the embodiments of the present application provides a commodity data management system, including: a first server, a second server, and a database.
第一服务器用于获取商品数据,并将商品数据拆分为至少一个第一子文件,其中每个第一子文件中包含至少一个商品的属性数据。The first server is used for acquiring commodity data, and dividing the commodity data into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
第二服务器用于对各个第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库。The second server is configured to perform attribute data verification on each of the first sub-files, and store the verified attribute data in the database.
本申请实施例的数据管理过程中,CP只需按照格式要求提供商品数据,即可实现对商品数据的离线导入数据库。由于实际应用中CP原本就需要整理商品数据(无论是出于库存整理还是上架电商平台等目的,实际应用中CP一般都是需要整理商品数据的),因此对CP而言,只需要将商品数据按照格式要求整理即可,无需付出过多额外的工作。商品管理系统在接收到商品数据之后,会对商品数据进行数据拆分,得 到多个子文件(即第一子文件)。并会对各个子文件分别进行属性数据校验,以及校验通过的属性数据的上传。其中,第二服务器对各个子文件的校验,可以是串行处理或并行处理。当并行处理时,服务器可对多个子文件同时进行校验操作,从而提高校验的效率。In the data management process of the embodiment of the present application, the CP only needs to provide the commodity data according to the format requirements, and then the offline import of the commodity data into the database can be realized. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work. After receiving the commodity data, the commodity management system will perform data splitting on the commodity data to obtain multiple sub-files (namely, the first sub-file). Attribute data verification will be performed on each sub-file, and the attribute data that has passed the verification will be uploaded. Wherein, the verification of each sub-file by the second server may be serial processing or parallel processing. During parallel processing, the server can perform verification operations on multiple sub-files at the same time, thereby improving verification efficiency.
相对现有技术而言,本申请实施例大大降低了CP操作的技术门槛,可用性更高。同时对商品数据自动化的校验和数据存储,也极大地提升了对商品数据的管理效率。对应于图2A所示实施例,在本申请实施例中,第一服务器是指S102-S1032中的执行主体服务器。第二服务器是S104-S109中的执行主体服务器。Compared with the prior art, the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability. At the same time, the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data. Corresponding to the embodiment shown in FIG. 2A , in the embodiment of the present application, the first server refers to the execution subject server in S102-S1032. The second server is the execution subject server in S104-S109.
在第三方面的第一种可能的实现方式中,对各个第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库,具体包括:In a first possible implementation manner of the third aspect, attribute data verification is performed on each of the first sub-files, and the verified attribute data is stored in a database, specifically including:
第二服务器从至少一个第一子文件中选取出一个子文件作为第二子文件。The second server selects one sub-file from the at least one first sub-file as the second sub-file.
第二服务器对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库。The second server performs attribute data verification on the second sub-file, and uploads the verified attribute data in the second sub-file to the database.
第二服务器在完成对第二子文件的校验之后,返回执行获取至少一个第一子文件中的一个子文件的操作,直至所有第一子文件均被校验完成。After completing the verification of the second sub-file, the second server returns to perform the operation of acquiring one sub-file in the at least one first sub-file until all the first sub-files are verified.
在本申请实施例中,第二服务器会循环从这些子文件中选取各个子文件(即第二子文件)并进行处理,实现对各个子任务进行数据校验,并同步将子文件内的商品数据存储至数据库。实现了对商品数据自动化的校验和数据存储,也极大地提升了对商品数据的管理效率。In this embodiment of the present application, the second server will cyclically select and process each sub-file (ie, the second sub-file) from these sub-files, so as to perform data verification on each sub-task, and synchronize the commodities in the sub-files. Data is stored in the database. It realizes the automatic verification and data storage of commodity data, and greatly improves the management efficiency of commodity data.
另外,第二服务器可以是特指一台服务器,也可以是一个包含多台服务器的服务器集群中的任意一台服务器。当第二服务器是服务器集群中的任意一台服务器时。本申请实施例可以实现多台服务器同步进行子文件的处理校验。相对单台服务器而言,本申请实施例可以极大地提高对子文件的校验速度和可靠性。因此可以提高对商品数据入库的效率。In addition, the second server may refer to a specific server, or may be any server in a server cluster including multiple servers. When the second server is any server in the server cluster. In the embodiment of the present application, multiple servers can synchronously perform processing and verification of sub-files. Compared with a single server, the embodiment of the present application can greatly improve the verification speed and reliability of sub-files. Therefore, the efficiency of warehousing the commodity data can be improved.
在第三方面第一种可能实现方式的基础上,在第三方面的第二种可能的实现方式中,在从至少一个第一子文件中选取出一个子文件作为第二子文件之前,还包括:Based on the first possible implementation manner of the third aspect, in the second possible implementation manner of the third aspect, before selecting a subfile from the at least one first subfile as the second subfile, further include:
第一服务器还用于在数据库中创建与第一子文件一一对应的第一子任务。The first server is further configured to create a first subtask corresponding to the first subfile one-to-one in the database.
第二服务器从至少一个第一子文件中选取出一个子文件作为第二子文件,包括:The second server selects one subfile from the at least one first subfile as the second subfile, including:
第二服务器从数据库存储的第一子任务中确定出一个第二子任务,并获取至少一个第一子文件中与第二子任务关联的一个子文件。The second server determines a second subtask from the first subtasks stored in the database, and acquires a subfile associated with the second subtask in at least one first subfile.
第二服务器返回执行获取至少一个第一子文件中的一个子文件的操作,直至所有第一子文件均被校验完成,包括:The second server returns to perform the operation of acquiring one sub-file in the at least one first sub-file until all the first sub-files are verified, including:
第二服务器返回执行从数据库存储的第一子任务中确定出一个第二子任务的操作,直至所有第一子任务均被执行完成。The second server returns to perform the operation of determining a second subtask from the first subtasks stored in the database until all the first subtasks are executed and completed.
在本申请实施例中,通过第一服务器在数据库中创建与子文件一一对应的子任务(即第一子任务),并以确定所需执行的子任务(即第二子任务)的形式实现对各个子文件的选取。从而使得本申请实施例中,数据库可以有效记录各个子文件的校验情况,同时第二服务器也可以方便确定出每次所需校验的子文件。当第二服务器是服务器集群中的任意一台服务器时,通过在数据库创建子任务的形式,可以极大地方便服务器 集群中各个服务器对子文件的获取和校验。进而提高对子文件的处理效率。In this embodiment of the present application, a subtask (ie, the first subtask) corresponding to the subfiles is created in the database through the first server, and the form of the subtask to be executed (ie the second subtask) is determined. To achieve the selection of each sub-file. Therefore, in the embodiment of the present application, the database can effectively record the verification status of each sub-file, and at the same time, the second server can also conveniently determine the sub-file to be verified each time. When the second server is any server in the server cluster, by creating subtasks in the database, it can greatly facilitate the acquisition and verification of subfiles by each server in the server cluster. Further, the processing efficiency of the sub-files is improved.
在第三方面第二种可能实现方式的基础上,作为第三方面的第三种可能的实现方式,从数据库存储的第一子任务中确定出一个第二子任务的操作,包括:On the basis of the second possible implementation manner of the third aspect, as the third possible implementation manner of the third aspect, an operation of a second subtask is determined from the first subtask stored in the database, including:
第二服务器向数据库发送任务查询请求。The second server sends a task query request to the database.
数据库响应于接收到的任务查询请求,从第一子任务中筛选出待执行的子任务,并将待执行的子任务发送至第二服务器,待执行的子任务包括未执行的第一子任务,以及执行中且执行时长超出时长阈值的第一子任务。In response to the received task query request, the database selects the subtasks to be executed from the first subtasks, and sends the subtasks to be executed to the second server, where the subtasks to be executed include the unexecuted first subtasks , and the first subtask that is being executed and whose execution duration exceeds the duration threshold.
第二服务器从接收到的待执行的子任务中,确定出第二子任务。The second server determines the second subtask from the received subtasks to be executed.
在本申请实施例中,会以子任务的执行状态为依据,来区分子任务是否为待执行子任务。考虑到实际应用中,一方面,未执行的子任务需要服务器进行处理。另一方面,实际应用中可能会存在服务器异常无法正常处理子任务的情况,例如服务器由于宕机等原因导致无法正常处理子任务。此时子任务虽然处于执行中,但已经无法执行完成。即使继续等待服务器,也无法完成对子任务的处理,无法实现对子文件的校验。因此需要由其他服务器重新处理这些子任务。基于上述两方面的考量。本申请实施例将未执行的子任务,以及执行中但执行时长超时的子任务,均视为待执行子任务。在此基础上,服务器会获取所有实时所有待执行子任务,并从中确定出所需执行的第二子任务。In this embodiment of the present application, whether the subtask is a subtask to be executed is determined based on the execution state of the subtask. Considering practical applications, on the one hand, unexecuted subtasks need to be processed by the server. On the other hand, in practical applications, there may be cases where the server cannot process subtasks normally, for example, the server cannot process subtasks normally due to reasons such as downtime. At this time, although the subtask is being executed, it cannot be completed. Even if you continue to wait for the server, the processing of the subtasks cannot be completed, and the verification of the subfiles cannot be realized. Therefore these subtasks need to be reprocessed by other servers. Based on the above two considerations. In this embodiment of the present application, unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed. On this basis, the server obtains all real-time subtasks to be executed, and determines the second subtask to be executed from them.
在第三方面第三种可能实现方式的基础上,作为第三方面的第四种可能的实现方式,第二服务器从接收到的待执行的子任务中,确定出第二子任务的操作,包括:Based on the third possible implementation manner of the third aspect, as a fourth possible implementation manner of the third aspect, the second server determines the operation of the second subtask from the received subtasks to be executed, include:
第二服务器依次向缓存组件请求对各个待执行的子任务的分布式锁。The second server sequentially requests the cache component for distributed locks for each subtask to be executed.
第二服务器在请求到对单个待执行的子任务的分布式锁时,将该待执行的子任务作为第二子任务。When the second server requests a distributed lock for a single subtask to be executed, the subtask to be executed is regarded as the second subtask.
作为本申请的一个可选实施例,为了提高对子任务的处理效率,可以采用多个服务器同时对各个子任务进行处理。此时作为第一方面各个方案的执行主体的服务器,也是对子任务进行处理的一个服务器。实际应用中发现,可能会出现多个服务器同时选取同一子任务进行处理的情况。此时会导致对商品数据的处理效率降低。为了防止单个子任务同时被多个服务器重复处理,在本申请实施例中,服务器在接收到待执行子任务之后,首先会从中选取出一个子任务,并尝试向缓存组件申请对该子任务的分布式锁。由于单个子任务的分布式锁仅能分配给单个服务器。因此若该子任务未被其他服务器处理,理论上此时可以获取到对该子任务的分布式锁。反之若该子任务已经被其他服务器处理,基于执行前需要申请分布式锁的原则。此时缓存组件内会记录该子任务已被其他服务器申请分布式锁。因此此时会无法成功获取子任务的分布式锁。基于这一原理,在获取到分布式锁完成上锁的操作后,服务器会判定该子任务为此次所需执行的子任务。并会下载对应的子文件。反之,若获取分布式锁失败,则会重新执行子任务选取的操作,以重新选取适宜的子任务。As an optional embodiment of the present application, in order to improve the processing efficiency of subtasks, multiple servers may be used to process each subtask at the same time. At this time, the server that is the execution body of each solution in the first aspect is also a server that processes subtasks. In practical applications, it is found that multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced. In order to prevent a single subtask from being repeatedly processed by multiple servers at the same time, in this embodiment of the present application, after receiving the subtask to be executed, the server first selects a subtask from it, and tries to apply for the subtask to the cache component. Distributed lock. Since the distributed lock of a single subtask can only be assigned to a single server. Therefore, if the subtask is not processed by other servers, the distributed lock for the subtask can theoretically be obtained at this time. On the other hand, if the subtask has been processed by other servers, it is based on the principle of applying for a distributed lock before execution. At this time, the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time. Based on this principle, after obtaining the distributed lock and completing the locking operation, the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. Conversely, if the acquisition of the distributed lock fails, the subtask selection operation will be re-executed to reselect the appropriate subtask.
在第三方面第四种可能实现方式的基础上,作为第三方面的第五种可能的实现方式,在对第二子文件进行属性数据校验的过程中,第二服务器还用于:Based on the fourth possible implementation manner of the third aspect, as the fifth possible implementation manner of the third aspect, in the process of performing attribute data verification on the second sub-file, the second server is further configured to:
判断对第二子文件的校验时长是否达到时长阈值。It is judged whether the verification duration of the second sub-file reaches the duration threshold.
若对第二子文件的校验时长达到时长阈值,则释放对第二子文件的分布式锁。If the verification duration on the second subfile reaches the duration threshold, the distributed lock on the second subfile is released.
为了防止服务器出现故障,导致子任务被自身长时间占据,使得对子任务执行效率降低。服务器会自行统计对子任务的校验时长,并判断是否达到时长阈值。若达到,则说明自身对子任务校验超时,可能是自身出现故障。因此此时会释放对子文件的分布式锁,从而使得其他服务器可以执行该子任务。实现了对子任务的自动节点接管。进而使得对子任务执行的可靠性大大增强。In order to prevent the failure of the server, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask. The server will count the verification duration of subtasks by itself, and determine whether the duration threshold is reached. If it is reached, it means that the verification of the subtasks by itself has timed out, and it may be that it is faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
在第三方面第四种可能实现方式的基础上,作为第三方面的第六种可能的实现方式,商品数据管理系统还包括:缓存组件。Based on the fourth possible implementation manner of the third aspect, as the sixth possible implementation manner of the third aspect, the commodity data management system further includes: a cache component.
缓存组件用于在将对第二子文件的分布式锁分配给第二服务器后,开始计时。The cache component is configured to start timing after allocating the distributed lock on the second sub-file to the second server.
缓存组件还用于在计时时长达到时长阈值时,释放对第二子文件的分布式锁。The cache component is further configured to release the distributed lock on the second subfile when the timing duration reaches the duration threshold.
为了防止服务器出现故障,导致子任务被服务器长时间占据,使得对子任务执行效率降低。In order to prevent the failure of the server, the subtasks are occupied by the server for a long time, which reduces the execution efficiency of the subtasks.
缓存组件在分配分布式锁的同时,还会对该分布式锁进行计时,并判断是否达到时长阈值。若达到,则说明服务器对子任务校验超时,可能是服务器出现故障。因此此时会释放对子文件的分布式锁,从而使得其他服务器可以执行该子任务。实现了对子任务的自动节点接管。进而使得对子任务执行的可靠性大大增强。While allocating the distributed lock, the cache component will also time the distributed lock and determine whether the duration threshold is reached. If it is reached, it means that the server has timed out when verifying the subtask, and the server may be faulty. Therefore, the distributed lock on the subfile is released at this time, so that other servers can perform the subtask. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
在第三方面第一种至第六种可能实现方式的基础上,作为第三方面的第七种可能的实现方式,若商品数据中存在校验失败的属性数据,则第二服务器获取校验失败的属性数据的异常信息,并将异常信息存储至数据库。Based on the first to sixth possible implementations of the third aspect, as a seventh possible implementation of the third aspect, if there is attribute data that fails to be verified in the commodity data, the second server obtains the verification Exception information of the failed attribute data, and store the exception information to the database.
本申请实施例会对解析出的属性数据进行合法性校验。即判断各个属性数据是否存在数据缺失或者数据错误等问题。当存在这些问题时,说明对这些属性数据校验失败。此时本申请实施例会将数据解析异常对应的异常信息上传至数据库。由数据对异常信息进行记录。在此基础上,可以将这些异常信息反馈至CP,或者由CP自行查询。从而使得CP可以快速或者哪些数据存在问题,并可以进行针对性的检查和重新入库。进而提高了对商品数据入库的效率。In this embodiment of the present application, the parsed attribute data is checked for validity. That is, it is determined whether each attribute data has problems such as missing data or data errors. When these problems exist, it means that the verification of these attribute data fails. At this time, the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data. On this basis, the abnormal information can be fed back to the CP, or the CP can query it by itself. Thus, the CP can quickly or which data has problems, and can perform targeted inspection and restocking. Thus, the efficiency of warehousing the commodity data is improved.
在第三方面第一种至第七种可能实现方式的基础上,作为第三方面的第八种可能的实现方式,商品数据为数据表格式的数据。Based on the first to seventh possible implementation manners of the third aspect, as an eighth possible implementation manner of the third aspect, the commodity data is data in a data table format.
在本申请实施例中,设定商品数据的格式为数据表格式。由于数据表格式是一种常用的数据记录格式,而是许多CP日常整理商品数据时使用的格式。因此对于CP而言,若要求商品数据以数据表格式提供,CP仅需对原本的商品数据进行简单的整理,即可得到商品管理系统所需的商品数据。从而使得对CP的技术门槛以及工作量要求大幅度降低,进而提高了对商品数据管理的效率。In the embodiment of the present application, the format of the commodity data is set as a data table format. Since the data table format is a common data recording format, it is the format that many CPs use when they organize commodity data on a daily basis. Therefore, for the CP, if the commodity data is required to be provided in a data table format, the CP only needs to simply organize the original commodity data to obtain the commodity data required by the commodity management system. As a result, the technical threshold and workload requirements for CP are greatly reduced, thereby improving the efficiency of commodity data management.
在第三方面第一种至第八种可能实现方式的基础上,作为第三方面的第九种可能的实现方式,对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库,包括:Based on the first to eighth possible implementation manners of the third aspect, as a ninth possible implementation manner of the third aspect, attribute data verification is performed on the second sub-file, and the second sub-file is corrected The verified attribute data is uploaded to the database, including:
在对第二子文件进行属性数据校验的过程中,第二服务器将第二子文件中校验通过的属性数据上传至数据库。或者,在对第二子文件进行属性数据校验完成后,第二服务器将第二子文件中校验通过的属性数据上传至数据库。In the process of verifying the attribute data of the second sub-file, the second server uploads the verified attribute data in the second sub-file to the database. Alternatively, after the attribute data verification of the second sub-file is completed, the second server uploads the attribute data that has passed the verification in the second sub-file to the database.
本申请实施例的有益效果可参考第一方面的第八种可能的实现方式中的有益效果说明,此处不予赘述。For the beneficial effects of the embodiments of the present application, reference may be made to the description of the beneficial effects in the eighth possible implementation manner of the first aspect, which will not be repeated here.
在第三方面第一种至第八种可能实现方式的基础上,作为第三方面的第九种可能的实现方式,商品数据内的属性数据中,包含商品图片下载地址,方法还包括:On the basis of the first to eighth possible implementation manners of the third aspect, as the ninth possible implementation manner of the third aspect, the attribute data in the commodity data includes the download address of the commodity image, and the method further includes:
第二服务器根据校验通过的属性数据中包含的商品图片下载地址,下载商品图片。The second server downloads the image of the product according to the download address of the image of the product contained in the attribute data that has passed the verification.
第二服务器对商品图片进行图像特征分析,得到图像特征数据。The second server performs image feature analysis on the product image to obtain image feature data.
第二服务器将图像特征数据存储至特征库。The second server stores the image feature data in the feature library.
本申请实施例会对入库商品进行图像特征分析并存储至特征库,以供后续用户商品搜索时使用。因此本申请实施例可以为后续商品搜索提供数据支持。In this embodiment of the present application, image feature analysis of the in-warehouse commodity is performed and stored in the feature database for use in subsequent user commodity search. Therefore, the embodiments of the present application can provide data support for subsequent commodity searches.
本申请实施例的第四方面提供了一种商品数据管理装置,包括:A fourth aspect of the embodiments of the present application provides a commodity data management device, including:
商品数据获取模块,用于获取商品数据,并将商品数据拆分为至少一个第一子文件,其中每个第一子文件中包含至少一个商品的属性数据。The commodity data acquisition module is used for acquiring commodity data, and dividing the commodity data into at least one first sub-file, wherein each first sub-file contains attribute data of at least one commodity.
入库模块,用于对各个第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库。The storage module is used to perform attribute data verification on each first sub-file, and store the verified attribute data in the database.
在第四方面的第一种可能的实现方式中,入库模块,包括:In a first possible implementation manner of the fourth aspect, the library module includes:
文件选取模块,用于从至少一个第一子文件中选取出一个子文件作为第二子文件。The file selection module is used for selecting a sub-file from the at least one first sub-file as the second sub-file.
数据校验模块,用于对第二子文件进行属性数据校验,并将第二子文件中校验通过的属性数据上传至数据库。The data verification module is used for performing attribute data verification on the second sub-file, and uploading the verified attribute data in the second sub-file to the database.
在完成对第二子文件的校验之后,返回执行从至少一个第一子文件中选取出一个子文件作为第二子文件的操作,直至所有第一子文件均被校验完成。After the verification of the second sub-file is completed, the operation of selecting one sub-file from the at least one first sub-file as the second sub-file is returned to execute until all the first sub-files are verified.
循环模块,用于获取一个第二子文件,第二子文件是从至少一个第一子文件中选取出的一个子文件。The loop module is used to obtain a second sub-file, where the second sub-file is a sub-file selected from at least one first sub-file.
本申请实施例的第五方面提供了一种商品搜索装置,包括:A fifth aspect of the embodiments of the present application provides a commodity search device, including:
图片接收模块,用于接收用户终端上传的第一商品图片。The picture receiving module is used for receiving the first commodity picture uploaded by the user terminal.
图像分析模块,用于对第一商品图片进行图像特征分析,得到第一图像特征数据。The image analysis module is configured to perform image feature analysis on the first commodity picture to obtain first image feature data.
特征匹配模块,用于从特征库存储的图像特征数据中,确定出与第一图像特征数据特征匹配度最高的至少一个第二图像特征数据。The feature matching module is configured to determine at least one second image feature data with the highest feature matching degree with the first image feature data from the image feature data stored in the feature library.
商品搜索模块,用于将与至少一个第二图像特征数据一一对应的第二商品图片,以及与第二商品图片关联的属性数据发送至用户终端,其中,发送的第二商品图片及关联的属性数据,是基于第一商品图片内包含的商标信息进行排序后的第二商品图片及属性数据。A commodity search module, configured to send a second commodity picture corresponding to at least one second image feature data one-to-one and attribute data associated with the second commodity picture to the user terminal, wherein the sent second commodity picture and the associated The attribute data is the second product image and attribute data sorted based on the trademark information contained in the first product image.
本申请实施例的第六方面提供了一种服务器,所述服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,使得服务器实现如上述第一方面中任一项所述商品数据管理方法的步骤。或者使得服务器实现如上述第二方面中任一项所述商品搜索方法的步骤。A sixth aspect of the embodiments of the present application provides a server, the server includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the computer program At the time, the server is made to implement the steps of the commodity data management method according to any one of the above-mentioned first aspects. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
本申请实施例的第七方面提供了一种计算机可读存储介质,包括:存储有计算机程序,所述计算机程序被处理器执行时,使得服务器实现如上述第一方面中任一项所述商品数据管理方法的步骤。或者使得服务器实现如上述第二方面中任一项所述商品搜索方法的步骤。A seventh aspect of the embodiments of the present application provides a computer-readable storage medium, including: a computer program is stored, and when the computer program is executed by a processor, the server implements the commodity according to any one of the foregoing first aspects. Steps of a data management method. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
本申请实施例的第八方面提供了一种计算机程序产品,当计算机程序产品在服务器上运行时,使得服务器执行上述第一方面中任一项所述商品数据管理方法。或者使 得服务器实现如上述第二方面中任一项所述商品搜索方法的步骤。An eighth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product runs on a server, causes the server to execute the commodity data management method according to any one of the first aspects above. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
本申请实施例的第九方面提供了一种芯片系统,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行存储器中存储的计算机程序,以实现上述第一方面任一项所述的商品数据管理方法。或者使得服务器实现如上述第二方面中任一项所述商品搜索方法的步骤。A ninth aspect of the embodiments of the present application provides a chip system, the chip system includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement any of the foregoing first aspects. A method for managing commodity data as described. Alternatively, the server is made to implement the steps of the method for searching for goods according to any one of the above-mentioned second aspects.
其中,芯片系统可以是单个芯片或者,多个芯片组成的芯片模组。The chip system may be a single chip or a chip module composed of multiple chips.
可以理解的是,上述第四方面至第九方面的有益效果可以参见上述第一方面或第二方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the fourth aspect to the ninth aspect, reference may be made to the relevant descriptions in the first aspect or the second aspect, which will not be repeated here.
附图说明Description of drawings
图1A是本申请实施例提供的一种商品搜索界面示意图;1A is a schematic diagram of a commodity search interface provided by an embodiment of the present application;
图1B是本申请实施例提供的一种商品数据上传界面示意图;1B is a schematic diagram of a commodity data upload interface provided by an embodiment of the present application;
图2A是本申请实施例提供的一种商品管理系统的系统交互图;2A is a system interaction diagram of a commodity management system provided by an embodiment of the present application;
图2B是本申请实施例提供的应用场景示意图;FIG. 2B is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2C是本申请实施例提供的商品数据管理方法中申请子任务分布式锁流程示意图;FIG. 2C is a schematic flowchart of applying for a subtask distributed lock in the commodity data management method provided by the embodiment of the present application;
图2D是本申请实施例提供的商品数据管理方法中,对子文件校验的流程示意图;2D is a schematic flowchart of sub-file verification in the commodity data management method provided by the embodiment of the present application;
图2E是本申请实施例提供的应用场景示意图;2E is a schematic diagram of an application scenario provided by an embodiment of the present application;
图3A是本申请实施例提供的一种商品管理系统的系统交互图;3A is a system interaction diagram of a commodity management system provided by an embodiment of the present application;
图3B是本申请实施例提供的一种商品管理系统的商品管理服务架构图;3B is a commodity management service architecture diagram of a commodity management system provided by an embodiment of the present application;
图3C是本申请实施例提供的一种商品管理系统的商品管理服务场景交互图;3C is an interaction diagram of a commodity management service scenario of a commodity management system provided by an embodiment of the present application;
图4A是本申请实施例提供的商品数据管理方法中,进行商品检测的流程示意图;4A is a schematic flowchart of commodity detection in the commodity data management method provided by the embodiment of the present application;
图4B是本申请实施例提供的商品数据管理方法中,进行图片搜索的流程示意图;4B is a schematic flowchart of image search in the commodity data management method provided by the embodiment of the present application;
图4C是本申请实施例提供的商品搜索的逻辑架构示意图;4C is a schematic diagram of a logical architecture of commodity search provided by an embodiment of the present application;
图4D是本申请实施例提供的商品搜索的逻辑架构示意图;4D is a schematic diagram of a logical architecture of commodity search provided by an embodiment of the present application;
图5A是本申请实施例提供的一种商品管理系统中,进行文本搜索时的系统交互图;5A is a system interaction diagram when performing text search in a commodity management system provided by an embodiment of the present application;
图5B是本申请实施例提供的应用场景示意图;5B is a schematic diagram of an application scenario provided by an embodiment of the present application;
图5C是本申请实施例提供的应用场景示意图;5C is a schematic diagram of an application scenario provided by an embodiment of the present application;
图5D是本申请实施例提供的应用场景示意图;FIG. 5D is a schematic diagram of an application scenario provided by an embodiment of the present application;
图5E是本申请实施例提供的应用场景示意图;5E is a schematic diagram of an application scenario provided by an embodiment of the present application;
图6是本申请实施例提供的一种商品管理系统中,进行图片搜索时的系统交互图。FIG. 6 is a system interaction diagram during image search in a commodity management system provided by an embodiment of the present application.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
为了便于理解,此处先对本申请实施例进行简要说明:For ease of understanding, here is a brief description of the embodiments of the present application:
实际应用中,存在一些需要提高商品曝光度的场景。例如以下几种场景:In practical applications, there are some scenarios that need to improve the exposure of products. For example the following scenarios:
场景1:对商家而言,其会在一些电商平台之中上架商品,以实现对商品的曝光 和销售。但实际应用中电商平台的数量较多,且每个电商平台中往往会包含大量的商品。从而使得商家的商品曝光度往往较低,被用户获知或购买概率较低。Scenario 1: For merchants, they will list products on some e-commerce platforms to achieve product exposure and sales. However, in practical applications, there are a large number of e-commerce platforms, and each e-commerce platform often contains a large number of commodities. As a result, the exposure of merchants' products is often low, and the probability of being known or purchased by users is low.
场景2:对电商平台和销售网站而言,单个电商平台中,可能会包含大量的商家及商家上架的商品。电商平台可以通过各种推荐或排序算法,实现对用户的商品推送或展示。但对于上架商品的总数量而言,这些被推送或者展示的商品数量往往占比极低。例如单个主流的电商平台内可能有数亿件上架的商品,但向用户推送或展示的商品数量可能仅有数千件。此时绝大部分的商品是难以被用户获知或购买的。因此电商平台对商品的曝光度较低,不利于电商平台的发展。Scenario 2: For e-commerce platforms and sales websites, a single e-commerce platform may contain a large number of merchants and products listed by merchants. E-commerce platforms can push or display products to users through various recommendation or sorting algorithms. However, the number of these pushed or displayed products often accounts for a very low proportion of the total number of products on the shelves. For example, there may be hundreds of millions of products listed on a single mainstream e-commerce platform, but the number of products pushed or displayed to users may only be thousands. At this time, most of the commodities are difficult to be known or purchased by users. Therefore, the exposure of e-commerce platforms to commodities is low, which is not conducive to the development of e-commerce platforms.
针对这些需要提高商品曝光度的场景,一种可选的解决方式是为用户提供商品搜索服务。即用户可以根据自己实际需求,搜索电商平台和销售网站的商品。例如可以参考图1A,是一种商品搜索的界面示意图。用户可以根据需求在图1A所示界面中输入商品名称并进行商品搜索。但商品搜索的前提是需要内容提供商(Content Provider,CP)向商品管理系统提供商品数据,并将商品数据存入数据库(又称为入库),使得商品管理系统可以基于这些商品数据来为用户提供商品搜索服务。例如可以参考图1B,是一种商品数据上传界面示意图。CP可以根据自身需要及商品管理系统要求来整理商品数据,再将商品数据上传至商品管理系统的数据库。For these scenarios where product exposure needs to be improved, an optional solution is to provide users with a product search service. That is, users can search for products on e-commerce platforms and sales websites according to their actual needs. For example, reference may be made to FIG. 1A , which is a schematic diagram of a product search interface. The user can input the product name in the interface shown in FIG. 1A and search for the product according to the requirements. However, the premise of commodity search is that the content provider (Content Provider, CP) needs to provide commodity data to the commodity management system, and store the commodity data in the database (also known as warehousing), so that the commodity management system can use these commodity data to provide commodity data. The user provides a product search service. For example, please refer to FIG. 1B , which is a schematic diagram of a product data uploading interface. The CP can organize the commodity data according to its own needs and the requirements of the commodity management system, and then upload the commodity data to the database of the commodity management system.
本申请实施例实现了对商品数据自动化校验和入库的效果。相对现有技术中需要CP自行整理结构化商品数据,并上传至商品管理系统的存储桶而言。本申请实施例大大的降低了CP商品数据操作的技术门槛,实现了对商品数据的高效管理。从而避免了复杂繁琐的商品数据操作使得商品数据管理效率较低的问题。The embodiments of the present application achieve the effects of automatic verification and storage of commodity data. Compared with the prior art, the CP needs to organize the structured commodity data by itself and upload it to the storage bucket of the commodity management system. The embodiment of the present application greatly reduces the technical threshold of CP commodity data operation, and realizes efficient management of commodity data. Thus, the problem of low efficiency of commodity data management due to complicated and cumbersome commodity data operations is avoided.
同时对本申请实施例中可能涉及到的一些名词和概念进行说明如下:At the same time, some terms and concepts that may be involved in the embodiments of the present application are described as follows:
商品数据:是由一个或多个商品的属性数据组成的数据,例如可以由商品名称、价格和链接等组成商品数据。其中,商品数据内具体包含的商品数量可由CP根据实际情况确定。另外在本申请实施例中,可由技术人员在设置商品数据的格式时,一并设置对商品的属性数据提供要求。要求中可以包含必须提供的属性数据和可选提供的属性数据。在此基础上,CP再根据实际需求来进行属性数据的提供。因此商品数据内实际包含的属性数据种类及数量,需根据实际应用中技术人员设置的属性数据提供要求以及CP提供数据的情况确定,此处不做过多限定。另外,本申请实施例亦不对属性数据提供要求的内容进行过多限定,可由技术人员根据实际需求自行设定。在本申请实施例中,当CP终端以文件的形式上传这些属性数据时,商品数据也可以叫做商品文件。应当说明地,在本申请实施例中,商品数据可以是结构化数据,也可以是非结构化的数据。由于本申请实施例中商品数据会被拆分为多个子文件进行校验,并将校验通过的属性数据存储至数据库。因此无论商品数据是结构化或非结构化数据,在本申请实施例中,均可以实现对商品数据的高效校验和入库,从而提高商品数据管理效率。相应的,当商品数据为结构化数据时,对子文件内属性数据存入数据库的过程,仍为对商品数据结构化存储的过程。Commodity data: It is data composed of attribute data of one or more commodities, for example, commodity data can be composed of commodity names, prices, and links. The quantity of commodities specifically included in the commodity data may be determined by the CP according to the actual situation. In addition, in the embodiment of the present application, when setting the format of the commodity data, the technical personnel can also set the requirements for providing attribute data of the commodity. A request can contain mandatory and optional attribute data. On this basis, CP provides attribute data according to actual needs. Therefore, the type and quantity of attribute data actually included in the commodity data need to be determined according to the attribute data provision requirements set by the technical personnel in the actual application and the data provided by the CP, and there are no excessive restrictions here. In addition, the embodiments of the present application do not limit the content required for providing attribute data too much, which can be set by technical personnel according to actual needs. In this embodiment of the present application, when the CP terminal uploads these attribute data in the form of files, the commodity data may also be called commodity files. It should be noted that, in this embodiment of the present application, the commodity data may be structured data or unstructured data. Because in the embodiment of the present application, the commodity data will be divided into multiple sub-files for verification, and the attribute data that has passed the verification will be stored in the database. Therefore, regardless of whether the commodity data is structured or unstructured data, in the embodiments of the present application, efficient verification and storage of commodity data can be achieved, thereby improving commodity data management efficiency. Correspondingly, when the commodity data is structured data, the process of storing the attribute data in the sub-file into the database is still the process of structured storage of the commodity data.
以一实例进行说明,假设属性数据提供要求中包含:商品的序号(Identity document,ID)、类目、名称、价格、图片地址、网页链接以及应用程序(Application,App)链接。其中商品编码、类目、名称和价格均是必须提供的属性数据,而图片地址、网页 链接和App链接则是可选提供的属性数据。并要求以数据表的格式提供商品数据。其中图片地址是指商品图片的下载地址。在此基础上,CP可以根据上述要求准备商品的属性数据,并根据实际可获取到的商品属性数据情况来整理对应的数据表。例如CP提供的商品数据可以如下表1(此时商品的数量为4):As an example, it is assumed that the attribute data provision request includes: product serial number (Identity document, ID), category, name, price, picture address, web page link, and application (Application, App) link. The product code, category, name and price are all required attribute data, while picture address, web page link and App link are optional attribute data. And ask for product data in data table format. The image address refers to the download address of the product image. On this basis, the CP can prepare the attribute data of the commodity according to the above requirements, and organize the corresponding data table according to the actually obtained commodity attribute data. For example, the commodity data provided by CP can be as follows in Table 1 (the number of commodities is 4 at this time):
表1Table 1
Figure PCTCN2021116999-appb-000001
Figure PCTCN2021116999-appb-000001
商品信息:商品数据是CP提供的原始数据。为了实现对商品数据的管理,本申请实施例会将商品数据存储至数据库之中(即入库)。为了区分存储前后的商品数据,本申请实施例将数据库内存储的商品数据称为商品信息。应当说明地,当商品数据中包含图片等无法结构化存储至数据库的数据时,则将这部分数据作为商品信息之外的数据,存储在数据库之外的其他地方。例如可以存储在网络存储平台。此时所需管理的商品数据由商品信息和无法结构化的数据两部分组成。应当说明地,在本申请实施例中,商品信息均为文本类型的信息。Commodity information: Commodity data is the original data provided by CP. In order to realize the management of commodity data, the embodiment of the present application will store the commodity data in a database (that is, put in a warehouse). In order to distinguish the commodity data before and after storage, the embodiment of the present application refers to the commodity data stored in the database as commodity information. It should be noted that when the commodity data includes pictures and other data that cannot be stored in the database in a structured manner, this part of the data is regarded as data other than the commodity information and stored in other places than the database. For example, it can be stored in a network storage platform. The commodity data to be managed at this time consists of commodity information and unstructured data. It should be noted that, in the embodiments of the present application, the commodity information is all text-type information.
数据库(DataBase,DB)及入库:数据库是按照数据结构来组织、存储和管理数据的数据仓库。在本申请实施例中,数据库用于存储商品信息。Database (DataBase, DB) and warehousing: The database is a data warehouse that organizes, stores and manages data according to the data structure. In this embodiment of the present application, the database is used to store commodity information.
应当说明地,结构化数据也称作行数据,是由二维表结构来逻辑表达和实现的数据(即结构化数据是以二维表形式来存储的数据),严格地遵循数据格式与长度规范,主要通过关系型数据库进行存储和管理。在本申请实施例中,商品信息为结构化的商品数据,即属于结构化数据。由于数据库是以一定的数据结构来进行数据存储,因此将商品数据入库的过程,即为将商品数据按照数据库结构化要求入库。由此可知,入库过程已经包含了对商品数据结构化操作。It should be noted that structured data, also known as row data, is data that is logically expressed and implemented by a two-dimensional table structure (that is, structured data is data stored in the form of a two-dimensional table), and strictly follows the data format and length. Specifications are mainly stored and managed through relational databases. In the embodiment of the present application, the commodity information is structured commodity data, that is, it belongs to structured data. Since the database uses a certain data structure for data storage, the process of putting the commodity data into the warehouse is to store the commodity data in accordance with the database structure requirements. It can be seen from this that the warehousing process already includes the structuring of commodity data.
数据库具体所处的终端设备情况此处不予限定,例如可以是处于单台服务器,或 者处于服务器集群。另外数据库的类型此处亦不做过多限定,可由技术人员根据实际需求选取或设置。例如可以是Mysql、Oracle或者SqlServer。相应的,数据库中存储商品信息的二维表的结构样式,可根据具体数据库的类型来确定,此处不予限定。The specific terminal equipment where the database is located is not limited here, for example, it may be located on a single server or in a server cluster. In addition, the type of the database is not limited too much here, and can be selected or set by the technical personnel according to actual needs. For example, it can be Mysql, Oracle or SqlServer. Correspondingly, the structural style of the two-dimensional table storing commodity information in the database can be determined according to the type of the specific database, which is not limited here.
格式(商品数据的格式):为了便于对商品数据进行结构化存储(即存储至数据库),本申请实施例会预先由技术人员设置商品数据的格式。在此基础上,CP需要按照该格式来整理商品的属性数据,从而得到满足要求的商品数据。例如若格式为数据表,此时CP需要将商品的属性数据记录至数据表之中,从而得到数据表格式的商品数据。本申请实施例不对具体的格式做过多的要求,可由技术人员自行设置。Format (format of commodity data): In order to facilitate structured storage of commodity data (ie, storage in a database), the format of commodity data is set in advance by a technician in this embodiment of the present application. On this basis, the CP needs to organize the attribute data of the product according to this format, so as to obtain the product data that meets the requirements. For example, if the format is a data table, the CP needs to record the attribute data of the commodity into the data table, so as to obtain commodity data in the data table format. The embodiments of the present application do not impose too many requirements on the specific format, which can be set by technical personnel.
作为本申请的一个可选实施例,技术人员可以设置商品数据的格式为数据表,并同时设置相应的属性数据提供要求(如哪些属性数据需要提供,哪些可以选择提供或者不提供)。此时CP需要按照属性数据提供要求来整理商品的属性数据,并将整理好的属性数据记录至数据表之中。As an optional embodiment of the present application, technicians can set the format of commodity data as a data table, and set corresponding attribute data provision requirements (eg, which attribute data needs to be provided, and which can be provided or not provided). At this time, the CP needs to sort out the attribute data of the commodity according to the requirements for providing attribute data, and record the sorted attribute data into the data table.
在一些可选实施例中,技术人员可以以数据表模板的方式,来实现对商品数据的格式和属性数据提供要求的设置。即技术人员预先在数据表模板中设置好所需提供的属性。作为数据表模板的一个可选实施例,可以参考下表2:In some optional embodiments, technicians can implement the settings for the format of commodity data and the provision of attribute data in the form of a data table template. That is, the technical personnel pre-set the required properties in the data table template. As an optional embodiment of the data table template, you can refer to the following table 2:
表2Table 2
属性1 property 1 属性2property 2 属性3 property 3 属性4attribute 4 属性5attribute 5 属性6attribute 6 属性7property 7
                    
                    
在本实施例中,由技术人员预先在数据表模板第一行内填写CP所需提供的各个属性(即属性1至属性7)。其中,具体选取的属性此处不做过多限定,可由技术人员自行设定。例如可以是表1中的ID、类目、名称、价格、图片地址、网页链接以及App链接,亦可以是其他属性。在此基础上,CP需要根据数据表模板中预设好的属性,在数据表模板中填写或导入商品的各项属性数据,以完成对数据表模板的输入。In this embodiment, the first row of the data sheet template is filled in by the technician in advance to fill in each attribute (ie, attribute 1 to attribute 7) that the CP needs to provide. Among them, the specific selected attributes are not limited here, and can be set by technical personnel. For example, it can be the ID, category, name, price, picture address, web page link, and App link in Table 1, or other attributes. On this basis, the CP needs to fill in or import various attribute data of the commodity in the data table template according to the preset attributes in the data table template to complete the input of the data table template.
另外应当说明地,商品数据的格式是由技术人员预先设定的数据格式,其与商品数据入库的关系说明如下:In addition, it should be noted that the format of commodity data is a data format preset by technicians, and the relationship between it and commodity data storage is described as follows:
在本申请实施例中,先由技术人员设定好商品数据的格式,再由CP按照该格式来整理商品的属性数据,并得到满足该格式的商品数据。在此基础上,由CP将整理得到的商品数据上传至商品管理系统,并由商品管理系统对商品数据进行入库。由此可知,CP上传的商品数据(亦商品管理系统处理的原始商品数据)即为满足格式要求的数据。In the embodiment of the present application, the format of the commodity data is set by the technician first, and then the CP organizes the attribute data of the commodity according to the format, and obtains the commodity data that satisfies the format. On this basis, the CP uploads the sorted commodity data to the commodity management system, and the commodity management system stores the commodity data. It can be seen that the commodity data uploaded by the CP (also the original commodity data processed by the commodity management system) is the data that meets the format requirements.
CP终端:是指CP进行商品数据上传时使用的终端设备。本申请实施例不对CP终端的设备类型进行过多限定,可根据实际应用场景情况确定。例如,可以是台式电脑、笔记本电脑、平板电脑或者手机等。应当理解地,CP进行商品数据准备和商品数据上传的终端设备,可以不是同一设备。例如可以先使用笔记本电脑准备商品数据,再利用手机进行上传。因此理论上CP终端需具备数据上传能力,但不一定需要具备对商品数据内容的增加、删除、修改、校验以及对商品数据格式调整等数据编辑能力。CP terminal: refers to the terminal device used by CP to upload product data. This embodiment of the present application does not limit too much the device type of the CP terminal, which can be determined according to actual application scenarios. For example, it can be a desktop computer, a laptop computer, a tablet computer, or a mobile phone. It should be understood that the terminal device that the CP performs commodity data preparation and commodity data uploading may not be the same device. For example, you can use a laptop to prepare product data, and then upload it using a mobile phone. Therefore, in theory, the CP terminal needs to have the ability to upload data, but it does not necessarily need to have the ability to add, delete, modify, verify, and adjust the format of commodity data.
用户终端:是指用户进行商品搜索时使用的终端设备。实际应用中,用户可以通过在用户终端中输入文字或者输入图片,并上传到商品管理系统的方式,实现对商品 的搜索。其中,用户可以是消费者也可以是其他人员,如可以是CP,具体需根据应用场景确定。本申请实施例不对用户终端的设备类型进行过多限定,可根据实际应用场景情况确定。例如,可以是台式电脑、笔记本电脑、平板电脑、手机或者可穿戴设备等。User terminal: refers to the terminal device used by the user to search for goods. In practical applications, users can search for commodities by entering text or pictures in the user terminal and uploading them to the commodity management system. The user may be a consumer or other personnel, such as a CP, which needs to be determined according to the application scenario. This embodiment of the present application does not limit the device type of the user terminal too much, which can be determined according to the actual application scenario. For example, it can be a desktop computer, a laptop computer, a tablet computer, a mobile phone, or a wearable device.
网络存储平台(Network Storage Platform,NSP):用于对CP终端上传的商品数据的存储,以及对商品图片的存储。理论上具有数据存储能力以及数据传输能力的设备均可作为本申请实施例的NSP,实际应用中,NSP的具体设备类型和数量等,可由技术人员根据实际需求选取或设定。例如,可以是单台具有文档和图片存储功能的服务器,或者是具有文档和图片存储功能的服务器集群。Network Storage Platform (NSP): used for storage of commodity data uploaded by CP terminal, and storage of commodity pictures. In theory, devices with data storage capability and data transmission capability can be used as NSPs in the embodiments of the present application. In practical applications, the specific device type and quantity of NSPs can be selected or set by technicians according to actual needs. For example, it can be a single server with document and picture storage function, or a server cluster with document and picture storage function.
特征库:为了实现对商品的图片搜索,本申请实施例可以对商品图片进行图像特征分析,并得到图像特征数据。这些图像特征数据用于图片搜索时的图像匹配。在本申请实施例中,特征库是指用于存储图像特征数据的数据仓库。另外根据实际需求,特征库中也可以存储一些图像特征数据以外的其他数据。具体可由技术人员根据需求设定。其中,本申请实施例不对特征库所处的终端设备情况进行过多限定。可由技术人员根据实际需求选取或设定。例如可以是处于单台服务器,或者处于服务器集群。在本申请实施例中,特征库又可称为商品底库。Feature library: In order to realize image search for commodities, the embodiment of the present application may perform image feature analysis on commodity pictures, and obtain image feature data. These image feature data are used for image matching during image search. In this embodiment of the present application, the feature library refers to a data warehouse for storing image feature data. In addition, according to actual needs, other data other than image feature data can also be stored in the feature library. The specific can be set by the technical personnel according to the needs. Wherein, the embodiment of the present application does not limit too much the situation of the terminal device where the feature library is located. It can be selected or set by technicians according to actual needs. For example, it can be in a single server, or in a server cluster. In this embodiment of the present application, the feature library may also be referred to as a commodity base library.
缓存组件:提供分布式锁的管理服务。在本申请实施例中,为了防止单个任务同时被多个服务器执行,可以设置上锁机制。在服务器需要执行某个任务之前,先向缓存组件申请对该任务的分布式锁。在成功获得对该任务分布式锁时,即完成了对该任务的上锁。此时服务器可以执行该任务。相应的,此时其他服务器无法再从缓存组件申请到该任务的分布式锁,亦无法执行该任务。在本申请实施例中,不对缓存组件分布式锁的实现方法进行过多限定。可由技术人员根据实际需求设定。例如可以基于分布式缓存(DCS Redis)实现分布式锁,或者基于zookeeper实现分布式锁。同时,本申请实施例不对特征库所处的终端设备情况进行过多限定。可由技术人员根据实际需求选取或设定。例如可以是作为一个组件存在于服务器中。在本申请实施例中,当基于分布式缓存实现分布式锁时,缓存组件亦可称为DCS分布式锁。Cache component: Provides distributed lock management services. In this embodiment of the present application, in order to prevent a single task from being executed by multiple servers at the same time, a locking mechanism may be set. Before the server needs to perform a task, it first applies to the cache component for a distributed lock for the task. When the distributed lock for the task is successfully obtained, the locking of the task is completed. The server can now perform the task. Correspondingly, at this time, other servers can no longer apply for the distributed lock of the task from the cache component, and cannot execute the task. In the embodiments of the present application, the implementation method of the distributed lock of the cache component is not limited too much. It can be set by technicians according to actual needs. For example, distributed locks can be implemented based on distributed cache (DCS Redis), or distributed locks can be implemented based on zookeeper. Meanwhile, the embodiment of the present application does not limit too much the situation of the terminal device where the feature library is located. It can be selected or set by technicians according to actual needs. For example, it can exist as a component in the server. In this embodiment of the present application, when a distributed lock is implemented based on a distributed cache, the cache component may also be referred to as a DCS distributed lock.
为了说明本申请所述的技术方案,下面以技术人员设置商品数据的格式是数据表为例。对商品的数据管理和用户进行商品搜索两个部分,通过具体实施例来进行说明。In order to illustrate the technical solution described in the present application, the following is an example in which the format of the commodity data set by the technician is a data table. The two parts of commodity data management and commodity search by users will be described through specific embodiments.
部分一:商品管理系统对商品数据的管理操作。Part 1: The management operation of commodity data by commodity management system.
在本申请实施例中,商品管理系统包括:至少一个服务器、数据库和NSP。在一些可选实施例中,商品管理系统还可以包括缓存组件和特征库。图2A示出了商品管理系统在对商品数据进行数据管理时的系统交互图,详述如下:In this embodiment of the present application, the commodity management system includes: at least one server, a database, and an NSP. In some optional embodiments, the commodity management system may further include a cache component and a feature library. Fig. 2A shows the system interaction diagram of the commodity management system during data management of commodity data, which is described in detail as follows:
S101,CP终端上传商品数据至NSP。S101, the CP terminal uploads commodity data to the NSP.
对于CP而言,首先需要按照技术人员设定的格式来准备商品数据。以设定的格式是数据表模板为例进行举例说明。假设技术人员提供的数据表模板为下表3:For CP, it is first necessary to prepare commodity data according to the format set by the technician. The set format is a data table template as an example for illustration. Assuming the data sheet template provided by the technician is Table 3 below:
表3table 3
Figure PCTCN2021116999-appb-000002
Figure PCTCN2021116999-appb-000002
Figure PCTCN2021116999-appb-000003
Figure PCTCN2021116999-appb-000003
同时设置:ID、类目、名称、价格和图片地址为必须提供的属性数据,币种标识和图片ID为可选属性数据。网页连接、App链接和快应用链接,则至少填写一项。Also set: ID, category, name, price and picture address are required attribute data, and currency ID and picture ID are optional attribute data. Web page link, App link and Quick app link, fill in at least one item.
其中,ID是指商品序号。类目即商品的类别,根据需求不同,可以进行不同的分类。例如可以分为:服装、数码家电、鞋子、箱包、家居、玩具、美妆、配饰、食品和其它类。币种标识即为价格所属币种的标识,如人民币可以为¥,美元可以是$,英镑可以是£。图片地址,是指商品图片的下载地址。考虑到实际应用中CP可能会有较多的商品图片提供,此时若一一进行上传,操作会比较繁琐。因此在本申请实施例中,提供了图片地址属性。CP可以将商品图片存储在一些服务器之中,并在数据表模板中填写响应的图片地址,即可完成对商品图片的提供。图片ID是指商品图片的序号。网页链接(weburl),是指商品销售网页的链接,通过打开该链接,可以打开浏览器并进入对应的商品销售网页。网页链接可以是普通的网页链接,也可以是Html5网页链接。App链接,是指商品在App中的销售页面的链接。通过打开该链接,可以开启对应的App并跳转到App中的商品销售页面。快应用链接是指商品在快应用中的销售页面的链接。通过打开该链接,可以开启对应的快应用并跳转到快应用中的商品销售页面。其中,网页连接、App链接和快应用链接中,均可以填写一条或多条链接,实现对不同电商平台的跳转。例如提供3条不同的网页链接,分别对应着3个不同的电商平台下的商品销售网页。此时可以实现对不同电商平台的跳转。Among them, ID refers to the product serial number. Category is the category of goods, which can be classified according to different needs. For example, it can be divided into: clothing, digital appliances, shoes, bags, home, toys, beauty, accessories, food and other categories. The currency identification is the identification of the currency to which the price belongs. For example, RMB can be ¥, USD can be $, and GBP can be £. The image address refers to the download address of the product image. Considering that in practical applications, CP may provide more product pictures, if uploading one by one at this time, the operation will be more complicated. Therefore, in this embodiment of the present application, the image address attribute is provided. The CP can store the picture of the product in some servers, and fill in the corresponding picture address in the data table template to complete the provision of the picture of the product. Image ID refers to the serial number of the product image. A web page link (weburl) refers to a link of a product sales webpage, and by opening the link, a browser can be opened and a corresponding product sales webpage can be entered. The web page link can be an ordinary web page link or an Html5 web page link. App link refers to the link to the sales page of the product in the App. By opening this link, you can open the corresponding App and jump to the product sales page in the App. The quick app link refers to the link to the sales page of the product in the quick app. By opening this link, you can open the corresponding quick app and jump to the product sales page in the quick app. Among them, one or more links can be filled in the webpage link, App link and quick application link to realize the jump to different e-commerce platforms. For example, three different webpage links are provided, corresponding to the product sales webpages under three different e-commerce platforms. At this time, jumping to different e-commerce platforms can be realized.
CP在表3的基础上,可以根据实际情况来进行商品属性数据的填写或导入(实际应用中,CP一般在进行商品库存管理时就会整理商品数据。因此此处可以是根据CP原本的商品数据进行数据表模板的数据导入。此时对CP的工作量增加极少),以实现商品数据的准备。其中,商品的实际数量可以是几个,也可以是数千或数万个,具体需由CP根据实际情况确定。理论上每个商品都需要进行上述各项属性数据的填写,此时表格内每一行即代表着一个商品的数据。因此表3的行数也需根据商品的实际数量确定。例如参考表1,此时商品数量即为4。On the basis of Table 3, CP can fill in or import commodity attribute data according to the actual situation (in practical applications, CP generally organizes commodity data when carrying out commodity inventory management. Therefore, the original commodity based on CP can be used here. The data is imported into the data table template. At this time, the workload of the CP is very small) to realize the preparation of commodity data. Among them, the actual number of commodities may be a few, or thousands or tens of thousands, which needs to be determined by the CP according to the actual situation. In theory, each product needs to fill in the above attribute data. At this time, each row in the table represents the data of one product. Therefore, the number of rows in Table 3 also needs to be determined according to the actual quantity of commodities. For example, referring to Table 1, the number of commodities is 4 at this time.
作为本申请的一个可选实施例,在表3的基础上,技术人员可以根据实际需求添加或删除属性。例如可以删除快应用链接,或者增加序号、颜色、尺码等属性,或者将价格属性细分为价格最小值和价格最大值两个属性等。此处不做过多限定。As an optional embodiment of the present application, on the basis of Table 3, technical personnel can add or delete attributes according to actual requirements. For example, you can delete the quick application link, or add attributes such as serial number, color, and size, or subdivide the price attribute into two attributes, such as the minimum price and the maximum price. Not too limited here.
作为本申请的一个实施例,为了丰富对商品详情的说明,在表3的基础上,也可以添加商品描述属性。此时CP可以表3中自行填写一些对商品的描述文字,以方便用户深入了解商品情况。例如可以填写“此商品为绿色有机食品”。As an embodiment of the present application, in order to enrich the description of the product details, on the basis of Table 3, a product description attribute may also be added. At this time, the CP can fill in some descriptions of the product in Table 3, so as to facilitate the user to have an in-depth understanding of the product. For example, you can fill in "this product is green organic food".
在完成商品数据的准备之后,CP可以通过CP终端,将商品数据上传至NSP。例如当商品数据为记录有表3(填写或导入属性数据之后的表3)的表格文件时。CP可以通过手机或电脑等设备,来将表格文件上传至NSP。After completing the preparation of the commodity data, the CP can upload the commodity data to the NSP through the CP terminal. For example, when the commodity data is a table file in which Table 3 (Table 3 after filling in or importing attribute data) is recorded. The CP can upload the form files to the NSP through devices such as mobile phones or computers.
作为一个本申请实施例,为了便于CP操作,可以预先设置好用于商品数据上传的门户(Portal)网站。CP在实际操作中,可以通过CP终端访问该门户网站,并从门户网站界面内上传商品数据的方式。完成对商品数据的上传操作。As an embodiment of the present application, in order to facilitate the CP operation, a portal (Portal) website for uploading commodity data may be preset. In the actual operation of CP, the portal website can be accessed through the CP terminal, and commodity data can be uploaded from the portal website interface. Finish uploading product data.
其中,作为本申请的一个可选实施例,考虑到一些情况下CP终端可能无法与NSP 直接进行数据传输。此时可以在CP终端和NSP之间设置一个中间设备,例如可以是一台服务器。实际操作时,服务器为CP终端供可调用的应用程序接口(Application Programming Interface,API)。CP终端通过调用该API将商品数据发送至该中间设备,由中间设备上传至NSP,以完成对商品数据的上传。Wherein, as an optional embodiment of the present application, it is considered that the CP terminal may not be able to directly perform data transmission with the NSP in some cases. In this case, an intermediate device, such as a server, may be set between the CP terminal and the NSP. In actual operation, the server provides a callable application programming interface (Application Programming Interface, API) for the CP terminal. The CP terminal sends the commodity data to the intermediate device by calling the API, and the intermediate device uploads the commodity data to the NSP to complete the uploading of the commodity data.
S102,服务器从NSP中下载商品数据并进行拆分,得到一个或多个子文件。其中,每个子文件中包含至少一个商品的属性数据。S102, the server downloads the commodity data from the NSP and splits it to obtain one or more sub-files. Wherein, each sub-file contains attribute data of at least one commodity.
考虑到实际应用中商品数据内记录的商品数量往往较多。此时若直接对商品数据进行校验以及入库等操作,效率较低且容易出错。对于CP而言则需要等待较长的时间才能得知入库结果。因此为了提升对商品数据的管理效率,在本申请实施例中会对商品数据进行拆分,将商品数据拆分为多个子文件。其中,拆分得到的子文件,其格式与拆分前的商品数据可以相同或不同。例如当商品数据格式为数据表时,子文件可以是数据表,亦可以是其他格式的文件。详述如下:Considering that in practical applications, the number of commodities recorded in commodity data is often large. At this time, if the commodity data is directly checked and stored, the efficiency is low and it is prone to errors. For CP, it takes a long time to know the storage result. Therefore, in order to improve the management efficiency of the commodity data, in the embodiment of the present application, the commodity data will be split, and the commodity data will be divided into multiple sub-files. The format of the sub-file obtained by splitting may be the same or different from that of the product data before the splitting. For example, when the commodity data format is a data table, the sub-file can be a data table or a file in other formats. Details are as follows:
在商品数据上传至NSP之后,服务器会从NSP中下载商品数据,并将商品数据拆分为一个或多个子文件。其中,商品数据的拆分规则此处不做过多限定,可由技术人员根据实际需求确定。其中考虑到子文件中包含的数据量越小,理论上对单个子文件的处理速度、可靠性和时效性越高,但此时子文件的数量较多,又会造成整体处理的效率有所下降。因此技术人员可根据实际对商品数据管理的效率和可靠性等需求,来设定拆分规则。例如可以设置为:每个子文件中包含的商品数量为m,其中m为正整数,如可以为1000。此时每个子文件中均包含m个商品的属性数据(对于最后一个子文件,商品数量可以不足m)。又例如,亦可以设置为每个子文件中包含的商品数量为[1,n]中的一个任一整数值。该值可以是随机选取的,也可以是按照一定规则选取的。其中n为大于1的整数,如可以是1000。After the commodity data is uploaded to the NSP, the server will download the commodity data from the NSP and split the commodity data into one or more sub-files. Among them, the splitting rules of commodity data are not limited here, and can be determined by technical personnel according to actual needs. Considering that the smaller the amount of data contained in the sub-file, the higher the processing speed, reliability and timeliness of a single sub-file in theory, but at this time, the number of sub-files is large, which will cause the overall processing efficiency to decrease. decline. Therefore, technicians can set split rules according to the actual requirements for the efficiency and reliability of commodity data management. For example, it can be set as: the number of commodities contained in each sub-file is m, where m is a positive integer, such as 1000. At this time, each subfile contains attribute data of m commodities (for the last subfile, the number of commodities may be less than m). For another example, it can also be set that the number of commodities contained in each sub-file is any integer value in [1, n]. The value can be selected randomly or according to certain rules. where n is an integer greater than 1, such as 1000.
通过对商品数据进行子文件拆分,具有以下有益效果:By splitting product data into sub-files, it has the following beneficial effects:
1、子文件内包含的商品属性数据较少,相对包含较多商品属性数据的商品数据而言,服务器处理子文件的出错概率较低,可靠性更强。1. The sub-file contains less commodity attribute data. Compared with commodity data that contains more commodity attribute data, the server has a lower error probability in processing the sub-file and is more reliable.
2、由于单个商品数据内包含的商品属性数据较多,若交由单个服务器直接进行处理,耗时较长,效率较低。通过将商品数据拆分为子文件,再将子文件交由一个或多个服务器进行并行处理。可以极大地缩短处理时长,提高对商品数据的处理效率。对于CP而言,可以在较短的时间内获知商品数据的入库情况。因此亦可以提高CP对商品管理系统的使用体验。2. Since a single product data contains a lot of product attribute data, if it is directly processed by a single server, it will take a long time and the efficiency will be low. By splitting product data into sub-files, the sub-files are then handed over to one or more servers for parallel processing. It can greatly shorten the processing time and improve the processing efficiency of commodity data. For CP, the storage situation of commodity data can be known in a relatively short period of time. Therefore, the CP's experience in using the commodity management system can also be improved.
应当理解地,当划分规则中单个子文件内包含的商品数量大于或等于商品数据中包含的商品总数量时,会出现划分的结果仅有一个子文件的情况(此时理论上可以不存在拆分的动作)。因此S102中得到的子文件数为一个或多个。例如假设划分规则设置为:每个子文件中包含的商品数量为1000。但商品数据中包含商品数量不足1000,如900。此时所有商品的属性数据均会划分至同一子文件中。It should be understood that when the number of commodities contained in a single sub-file in the division rule is greater than or equal to the total number of commodities contained in the commodity data, there will be a situation in which the result of division is only one sub-file (theoretically, there may be no demolition at this time. action). Therefore, the number of sub-files obtained in S102 is one or more. For example, suppose the division rule is set as: the number of items contained in each subfile is 1000. However, the commodity data contains less than 1000 commodities, such as 900. At this time, the attribute data of all products will be divided into the same sub-file.
作为本申请的一个可选实施例,考虑到实际应用中单个商品数据内包含的商品数量可能极多。例如可能同时包含成千上万个商品的属性数据。为了实现对不同商品之间的区分,实现对单个商品的唯一确定。在数据拆分的时候,可以对子文件内各个商品添加唯一标识。该唯一标识可以作为商品的一个新的属性数据,添加至商品数据之 中。在一些可选实施例中,虽然CP可能会在商品数据中提供商品的ID等标识。但对于商品管理系统而言,CP的行为不可控。实践证明,CP提供的标识亦可能会出现重复、缺失、不规范等情况,因此标识不一定具有唯一性,且可信度相对较低。而本申请实施例中,由服务器自行为各个商品添加唯一标识,可以确保该唯一标识的可信度,进而确保对各个商品之间的准确区分。As an optional embodiment of the present application, considering that the number of commodities contained in a single commodity data may be extremely large in practical applications. For example, it may contain attribute data for thousands of products at the same time. In order to realize the distinction between different commodities, the unique determination of a single commodity is realized. When data is split, a unique identifier can be added to each item in the subfile. The unique identifier can be added to the commodity data as a new attribute data of the commodity. In some optional embodiments, although the CP may provide identifiers such as the ID of the commodity in the commodity data. But for the commodity management system, the behavior of CP is uncontrollable. Practice has proved that the logo provided by CP may also be duplicated, missing, irregular, etc. Therefore, the logo may not be unique, and the credibility is relatively low. However, in the embodiment of the present application, the server adds a unique identifier to each commodity by itself, which can ensure the reliability of the unique identifier, thereby ensuring accurate distinction between various commodities.
另外在一些可选实施例中,若要求CP以数据表模板的方式提供商品数据。此时单个商品的属性数据均处于同一行,即每一行即为一个商品的所有属性数据。因此可以将生成的唯一标识,作为商品的行号属性数据添加至数据表模板中。此时商品在商品数据中的行号,即为该商品的唯一标识。本申请实施例不对唯一标识的类型和生成方法做过多限定,可由技术人员自行设定。例如可以以商品数据上传时间加商品序号等方式组成唯一标识。亦可以是随机生成不重复的字符串,并作为单个商品的唯一标识。另外,可以对唯一标识的长度进行设定,例如可以设定为16位固定长度。在生成唯一标识时,若不足该长度,则进行空格补齐,或者进行补0。In addition, in some optional embodiments, if the CP is required to provide commodity data in the form of a data table template. At this time, the attribute data of a single product are in the same row, that is, each row is all the attribute data of a product. Therefore, the generated unique identifier can be added to the data table template as the row number attribute data of the product. At this time, the line number of the product in the product data is the unique identifier of the product. The embodiments of the present application do not limit the type and generation method of the unique identifier too much, which can be set by technical personnel. For example, the unique identifier can be formed by the upload time of the product data and the serial number of the product. It can also be a randomly generated non-repeating string, which is used as the unique identifier of a single product. In addition, the length of the unique identifier can be set, for example, it can be set to a fixed length of 16 bits. When generating the unique identifier, if it is less than this length, it will be filled with spaces, or filled with 0s.
S103,服务器将所有子文件存储至NSP,并获取各个子文件的在NSP中的下载地址。再基于下载地址,在数据库中创建与子文件一一对应的子任务,同时创建一个包含这些子任务的父任务。S103, the server stores all the sub-files in the NSP, and obtains the download address of each sub-file in the NSP. Based on the download address, subtasks corresponding to subfiles are created in the database, and a parent task containing these subtasks is created at the same time.
其中,S103可以细分为S1031和S1032:Among them, S103 can be subdivided into S1031 and S1032:
S1031,服务器将所有子文件存储至NSP,并获取各个子文件的在NSP中的下载地址。S1031, the server stores all the sub-files in the NSP, and obtains the download address of each sub-file in the NSP.
S1032,服务器基于下载地址,在数据库中创建与子文件一一对应的子任务,同时创建一个包含这些子任务的父任务。S1032: Based on the download address, the server creates sub-tasks corresponding to the sub-files one-to-one in the database, and creates a parent task including these sub-tasks at the same time.
在完成对商品数据的拆分,得到一个或多个子文件之后。进行拆分操作的服务器,会将得到的所有子文件统一上传至NSP。并会在存储的同时,获取各个子文件在NSP中的下载地址。其中,NSP与执行S103的服务器为相互独立的设备。After the product data is split and one or more sub-files are obtained. The server that performs the split operation will upload all the obtained sub-files to the NSP uniformly. At the same time of storage, the download address of each sub-file in NSP will be obtained. The NSP and the server that executes S103 are mutually independent devices.
在获取到各个子文件的下载地址之后,服务器会在数据库中创建与各个子文件一一对应的子任务,并会将各个子文件的下载地址存储至对应的子任务之中。在本申请实施例中,子任务可供服务器执行。其中执行子任务实质是:服务器通过子任务内的下载地址下载子任务对应的子文件,并对下载的子文件内的属性数据进行校验和入库。通过执行各个子任务,服务器可以实现对各个子文件的有效处理,实现最终对商品数据的入库。After obtaining the download address of each subfile, the server will create a subtask corresponding to each subfile one-to-one in the database, and store the download address of each subfile in the corresponding subtask. In this embodiment of the present application, the subtask can be executed by the server. The essence of executing the subtask is: the server downloads the subfile corresponding to the subtask through the download address in the subtask, and checks and stores the attribute data in the downloaded subfile. By executing each sub-task, the server can effectively process each sub-file, and finally realize the storage of commodity data.
实际应用中,每个子任务均需被执行才能实现对商品数据的完整入库。因此服务器在执行子任务的过程中,需要确认单个商品数据下的子任务是否均被执行完成(确认的方式可由技术人员自行设定,此处不做限定)。而实际应用中,商品管理系统可能需要同时处理多个商品数据。因此对于数据库而言,其可能同时存储有多个商品数据对应的子任务,此时子任务的数量较多,管理难度较大。为了便于管理各个商品数据下的子任务,服务器在创建子任务的同时,还会创建一个包含这些子任务的父任务。因此每个商品数据均对应有一个父任务,且单个父任务下包含对应商品数据下的所有子任务。在确认某个商品数据下的子任务是否均被执行完成时,查询该商品数据对应父任务内各个子任务的执行状态即可。因此可以提高对子任务管理的效率。In practical applications, each subtask needs to be executed to realize the complete storage of commodity data. Therefore, in the process of executing the subtasks, the server needs to confirm whether the subtasks under the single commodity data are all executed and completed (the confirmation method can be set by the technicians, which is not limited here). In practical applications, the commodity management system may need to process multiple commodity data at the same time. Therefore, for the database, it may store subtasks corresponding to multiple commodity data at the same time. At this time, the number of subtasks is large, and the management is more difficult. In order to facilitate the management of subtasks under each commodity data, the server also creates a parent task containing these subtasks when creating subtasks. Therefore, each product data corresponds to a parent task, and a single parent task contains all subtasks under the corresponding product data. When confirming whether all subtasks under a certain commodity data have been executed, query the execution status of each subtask in the parent task corresponding to the commodity data. Therefore, the efficiency of subtask management can be improved.
另外,本申请实施例还会记录各个子任务的执行状态。本申请实施例中,子任务的执行状态包含三种:未执行、执行中以及执行完成。当有子任务被服务器执行时,数据库会同步更新子任务的执行状态。In addition, the embodiment of the present application further records the execution status of each subtask. In the embodiment of the present application, the execution status of the subtask includes three types: not executed, executing, and executing completed. When a subtask is executed by the server, the database will update the execution status of the subtask synchronously.
其中,未执行是指该子任务当前未被任何服务器执行。执行中是指,该子任务当前正在被至少一个服务器执行,且还没有服务器执行完成该子任务。执行完成,是指该子任务已经被至少一个服务器执行完成。由于在本申请实施例中,执行子任务的实质是对子任务对应的子文件内的属性数据进行校验和入库。因此未执行实质是指该子任务对应的子文件内的属性数据尚未被校验和入库。执行中是指该子任务中对应的子文件内的属性数据正在被校验和入库。而执行完成则是指该子任务对应的子文件内的属性数据,已经完成校验和入库。对于刚创建的子任务,数据库内均会将执行状态标记为未执行。Among them, not executed means that the subtask is not currently executed by any server. Executing means that the subtask is currently being executed by at least one server, and no server has completed the subtask. Execution complete means that the subtask has been executed by at least one server. Because in the embodiment of the present application, the essence of executing the subtask is to check and store the attribute data in the subfile corresponding to the subtask. Therefore, the fact that it is not executed means that the attribute data in the subfile corresponding to the subtask has not been checked and stored. Executing means that the attribute data in the corresponding subfile in the subtask is being checked and stored. The execution completion means that the attribute data in the sub-file corresponding to the sub-task has been checked and stored. For the subtasks just created, the execution status will be marked as not executed in the database.
以一实例举例说明。假设对商品数据A进行拆分后得到了子文件a和子文件b。此时服务器会将两个子文件均存储至NSP,并获取对应的两个子文件下载地址。假设子文件a在NSP中的下载地址为:https://xxxhuawei.com/filea/huawei.html,子文件b在NSP中的下载地址为:https://xxxhuawei.com/fileb/huawei.html。此时服务器会在数据库中创建子任务a和子任务b,以及同时包含两个子任务的父任务A(父任务内可以没有实质的任务内容,仅记录有包含的子任务)。同时会将下载地址:https://xxxhuawei.com/filea/huawei.html存储至子任务a中,将下载地址:https://xxxhuawei.com/fileb/huawei.html存储至子任务b中。Take an example to illustrate. Suppose that sub-file a and sub-file b are obtained after the product data A is split. At this time, the server will store the two sub-files in the NSP, and obtain the corresponding download addresses of the two sub-files. Suppose the download address of sub-file a in NSP is: https://xxxhuawei.com/filea/huawei.html, and the download address of sub-file b in NSP is: https://xxxhuawei.com/fileb/huawei.html . At this time, the server will create subtask a and subtask b in the database, as well as a parent task A that contains both subtasks (the parent task may have no substantive task content, only the included subtasks are recorded). At the same time, the download address: https://xxxhuawei.com/filea/huawei.html is stored in subtask a, and the download address: https://xxxhuawei.com/fileb/huawei.html is stored in subtask b.
作为本申请的一个可选实施例,为了方便对各个子任务的区分,在创建子任务的同时,可以为每个子任务添加一个标识或ID。数据库和服务器在进行交互的时候,可以通过告知对方子任务标识或ID的方式,实现对子任务的唯一确定。As an optional embodiment of the present application, in order to facilitate the distinction of each subtask, when creating a subtask, an identifier or ID may be added to each subtask. When the database and the server interact, they can uniquely determine the subtask by informing each other of the subtask identifier or ID.
作为本申请的一个可选实施例,服务器将子文件存储至NSP的操作之后,服务器可以将NSP中CP终端上传的商品数据删除,以节约NSP存储空间。As an optional embodiment of the present application, after the server stores the sub-file in the NSP, the server may delete the commodity data uploaded by the CP terminal in the NSP to save NSP storage space.
作为本申请的一个可选实施例,考虑到实际应用中可能会有多个CP分别上传各自的商品数据。此时为了提高对商品数据的处理效果,可以同时使用多台服务器来完成。同时引入分布式锁,以避免造成服务器资源浪费。As an optional embodiment of the present application, considering that there may be multiple CPs uploading their respective commodity data in practical applications. At this time, in order to improve the processing effect of commodity data, multiple servers can be used at the same time. At the same time, distributed locks are introduced to avoid wasting server resources.
针对同时使用多台服务器进行商品数据拆分和任务创建的场景,为了防止单个商品数据被重复处理。本申请实施例会引入分布式锁的上锁机制。即S102中,服务器首先会针对单个商品数据的分布式锁。并仅会在获取到分布式锁(即完成上锁)之后,才会执行商品数据下载和拆分等操作。此时S102可以被替换为:S1021,服务器向缓存组建获取针对商品数据的分布式锁。若获取到分布式锁,则从NSP中下载商品数据并进行拆分,得到一个或多个子文件。其中,每个子文件中包含至少一个商品的属性数据。For scenarios where multiple servers are used for product data splitting and task creation, in order to prevent single product data from being repeatedly processed. The embodiments of the present application introduce a locking mechanism for distributed locks. That is, in S102, the server firstly performs a distributed lock for a single commodity data. And only after the distributed lock is obtained (that is, the lock is completed), operations such as downloading and splitting the product data will be performed. At this time, S102 can be replaced with: S1021, the server obtains the distributed lock for the commodity data from the cache component. If the distributed lock is obtained, the commodity data is downloaded from the NSP and split to obtain one or more sub-files. Wherein, each sub-file contains attribute data of at least one commodity.
作为本申请的一个可选实施例,参考图2B。在本申请实施例中,采用了多个服务器负责对商品数据进行拆分,以实现任务分解。同时为了防止单个商品数据被多个服务器重复处理,还引入了分布式锁。详述如下:As an optional embodiment of the present application, refer to FIG. 2B . In the embodiment of the present application, multiple servers are used to be responsible for splitting commodity data, so as to realize task decomposition. At the same time, in order to prevent single commodity data from being repeatedly processed by multiple servers, distributed locks are also introduced. Details are as follows:
各个服务器查询对记录有待处理商品数据的任务列表。Each server queries a task list that records commodity data to be processed.
若发现有需要处理的商品数据,则各个服务器分别申请针对商品数据的分布式锁。 即进行抢锁。If it is found that there is commodity data that needs to be processed, each server applies for a distributed lock for the commodity data respectively. That is, to grab the lock.
抢锁成功的服务器会作为S102-S103的执行主体,从NSP中下载商品数据。The server that successfully grabs the lock will act as the execution body of S102-S103 to download commodity data from NSP.
下载完商品数据之后,再进行子文件拆分,得到多个子文件。After the product data is downloaded, the sub-files are split to obtain multiple sub-files.
将得到的所有子文件存入NSP中,同时基于子文件的下载地址,在数据库中创建与各个子文件对应的子任务。All the obtained sub-files are stored in the NSP, and at the same time, based on the download address of the sub-files, sub-tasks corresponding to each sub-file are created in the database.
另外,在进行子文件拆分的时候,还可以为子文件中各个商品添加行号作为唯一标识。In addition, when the sub-file is split, a line number can also be added to each commodity in the sub-file as a unique identifier.
S104,服务器向数据库发送任务查询请求。S104, the server sends a task query request to the database.
在S102-S103完成对子任务的创建之后,本申请实施例开始对子任务进行处理,以校验子任务对应的各个子文件。为了实现对子任务的处理,首先会由服务器向数据库发送任务查询请求,该任务查询请求用以请求数据库告知服务器当前父任务下待执行的子任务。其中,任务查询请求的数据内容和格式等,此处不做过多限定,可由技术人员根据需求设定。After the subtasks are created in S102-S103, the embodiment of the present application starts to process the subtasks to verify each subfile corresponding to the subtasks. In order to realize the processing of the subtasks, the server first sends a task query request to the database, and the task query request is used to request the database to inform the server of the subtasks to be executed under the current parent task. Among them, the data content and format of the task query request, etc., are not limited here, and can be set by technical personnel according to requirements.
S105,数据库在接收到任务查询请求后,从父任务包含的所有子任务中,筛选出待执行的子任务,并将筛查出的子任务以子任务列表的形式返回给服务器。S105 , after receiving the task query request, the database filters out subtasks to be executed from all subtasks included in the parent task, and returns the screened subtasks to the server in the form of a subtask list.
在本申请实施例中,会以子任务的执行状态为依据,来区分子任务是否为待执行子任务。具体而言,由技术人员预先设置好待执行子任务所对应的执行状态。在此基础上,数据库在接收到任务查询请求之后,会识别父任务下各个子任务的执行状态,并从中筛选出执行状态满足要求的待执行子任务。例如可以将所有执行状态为未执行的子任务,均作为待执行子任务。此时数据库会将父任务下所有执行状态为未执行的子任务,均作为待执行子任务。In this embodiment of the present application, whether the subtask is a subtask to be executed is determined based on the execution state of the subtask. Specifically, the execution state corresponding to the subtask to be executed is preset by the technician. On this basis, after receiving the task query request, the database will identify the execution status of each subtask under the parent task, and screen out the subtasks to be executed whose execution status meets the requirements. For example, all subtasks whose execution status is not executed may be regarded as subtasks to be executed. At this time, the database will regard all subtasks whose execution status is not executed under the parent task as subtasks to be executed.
另外,根据实际需求,技术人员亦可在执行状态的基础上,增加一些其他的筛选条件,以实现对待执行子任务的精准区分和筛选。例如,在执行状态的基础上,还可以增加任务执行时长的限制。此时,数据库会同时获取子任务的执行状态和执行时长,并筛选出执行状态和执行时长均满足预设要求的子任务,作为待执行子任务。In addition, according to actual needs, technicians can also add some other screening conditions on the basis of the execution status, so as to achieve accurate distinction and screening of subtasks to be executed. For example, on the basis of the execution state, a limit on the execution time of the task can also be increased. At this time, the database will obtain the execution status and execution duration of the subtasks at the same time, and screen out subtasks whose execution status and execution duration both meet the preset requirements, as subtasks to be executed.
作为本申请的一个可选实施例,根据执行状态和执行时长的差异,待执行子任务有以下几种可选的范围:As an optional embodiment of the present application, according to the difference between the execution state and the execution duration, the subtasks to be executed have the following optional ranges:
1、包含所有未执行的子任务。1. Include all unexecuted subtasks.
2、包含所有未执行的子任务和所有执行中的子任务。2. Contains all unexecuted subtasks and all executing subtasks.
3、包括所有未执行的子任务和部分执行中的子任务。3. Include all unexecuted subtasks and partially executed subtasks.
实际应用中,技术人员可以根据需求来选取上述3种范围中的任意一种作为待执行子任务范围。亦可以根据需求自行设置待执行子任务的范围。此处不做过多限定。In practical applications, technicians can select any one of the above three ranges as the subtask range to be executed according to requirements. You can also set the scope of subtasks to be executed according to your needs. Not too limited here.
作为本申请的一个可选实施例。考虑到实际应用中,一方面,未执行的子任务需要服务器进行处理。另一方面,实际应用中可能会存在服务器异常无法正常处理子任务的情况,例如服务器由于宕机等原因导致无法正常处理子任务。此时子任务虽然处于执行中,但已经无法执行完成。即使继续等待服务器,也无法完成对子任务的处理,无法实现对子文件的校验。因此需要由其他服务器重新处理这些子任务。基于上述两方面的考量。本申请实施例将未执行的子任务,以及执行中但执行时长超时的子任务,均视为待执行子任务。此时S105中对待执行子任务的筛选操作,可以替换为:as an optional embodiment of the present application. Considering practical applications, on the one hand, unexecuted subtasks need to be processed by the server. On the other hand, in practical applications, there may be cases where the server cannot process subtasks normally, for example, the server cannot process subtasks normally due to reasons such as downtime. At this time, although the subtask is being executed, it cannot be completed. Even if you continue to wait for the server, the processing of the subtasks cannot be completed, and the verification of the subfiles cannot be realized. Therefore these subtasks need to be reprocessed by other servers. Based on the above two considerations. In this embodiment of the present application, unexecuted subtasks and subtasks that are being executed but whose execution time is overdue are regarded as subtasks to be executed. At this time, the filtering operation of the subtasks to be executed in S105 can be replaced with:
数据库在接收到任务查询请求后,从父任务包含的所有子任务中,筛选出未执行的子任务,以及执行中且执行时长超出时长阈值的子任务。After receiving the task query request, the database filters out unexecuted subtasks from all subtasks included in the parent task, and subtasks that are being executed and whose execution duration exceeds the duration threshold.
其中,为了衡量子任务是否超时,本申请实施例会预先设置一个时长阈值。并会将执行时长超出时长阈值的子任务判定为执行时长超时。Wherein, in order to measure whether the subtask times out, a duration threshold is preset in this embodiment of the present application. Subtasks whose execution duration exceeds the duration threshold will be determined as execution duration timeout.
在S105筛选出待执行子任务之后,本申请实施例会将这些子任务以放在同一列表(即子任务列表)中,并将子任务列表反馈至发送任务查询请求的服务器。其中,子任务列表中,各个子任务内均记录有对应的子文件下载地址,以便于服务器下载子文件进行处理。其中,若仅筛选出一个满足要求的待执行子任务,此时可以不整理子任务列表,而是直接将筛选出的子任务返回至服务器。亦可以返回仅包含一个子任务的子任务列表。具体可由技术人员自行设置。After the subtasks to be executed are screened out in S105, the embodiment of the present application will put these subtasks in the same list (ie, the subtask list), and feed back the subtask list to the server that sends the task query request. Wherein, in the subtask list, each subtask is recorded with a corresponding subfile download address, so that the server can download the subfile for processing. Wherein, if only one to-be-executed subtask that meets the requirements is screened out, the subtask list may not be sorted out at this time, but the screened subtask may be directly returned to the server. It is also possible to return a subtask list containing only one subtask. The details can be set by the technicians themselves.
作为本申请的一个可选实施例,为了方便服务器根据子任务列表确定所需执行的子任务。本申请实施例中,数据库会记录各个子任务的创建时间。在筛选出子任务之后,会按照创建时间和执行状态来对子任务进行优先级排序,并按照排序结果来生成子任务列表。再将排序完成后的子任务列表反馈给服务器。As an optional embodiment of the present application, for convenience, the server determines the subtask to be executed according to the subtask list. In this embodiment of the present application, the database records the creation time of each subtask. After the subtasks are filtered out, the subtasks will be prioritized according to their creation time and execution status, and a subtask list will be generated according to the sorting results. Then, the sorted subtask list is fed back to the server.
其中,具体的子任务优先级排序规则此处不做过多限定,可由技术人员根据实际需求设定。例如可以设定为未执行的子任务优先级高于执行中的子任务。其中,未执行的子任务按照创建时间从先到后优先级依次降低,执行中的子任务亦按照创建时间从先到后优先级依次降低。The specific subtask priority sorting rules are not limited here, and can be set by technical personnel according to actual needs. For example, it can be set that the priority of subtasks that are not executed is higher than that of subtasks that are being executed. Among them, the priority of the subtasks that are not executed decreases from first to last according to the creation time, and the priority of the executing subtasks also decreases from the first to the last according to the creation time.
另外应当说明地,作为本申请的一个可选实施例,对于S105中数据库所需执行的各个操作的逻辑,可以内置于数据库之中,亦可以以程序的形式内置于数据库所处的终端设备。具体可由技术人员根据实际需求设定。当内置于数据库之中时,S105的操作可由数据库自身完成。而当以程序的形式内置于数据库所处终端设备时,则由该终端设备完成S105的操作。In addition, it should be noted that, as an optional embodiment of the present application, the logic of each operation to be performed by the database in S105 can be built into the database, or can be built into the terminal device where the database is located in the form of a program. Specific can be set by technical personnel according to actual needs. When built into the database, the operation of S105 can be completed by the database itself. When the program is built into the terminal device where the database is located, the terminal device completes the operation of S105.
S106,服务器在接收到子任务列表之后,从子任务列表中确定出一个子任务,并按照确定出的子任务中的下载地址,从NSP中下载该子任务对应的子文件。S106: After receiving the subtask list, the server determines a subtask from the subtask list, and downloads a subfile corresponding to the subtask from the NSP according to the download address in the determined subtask.
服务器在接收到子任务列表之后,会从中选取出一个子任务作为此次执行的子任务。同时在完成子任务选取之后,服务器还会根据该子任务内的下载地址,从NSP内下载对应的子文件,以进行后续的数据校验。其中,本申请实施例不对子任务的选取方法进行过多的限定,可由技术人员根据实际进行设定。例如在一些可选实施例中,可以选取子任务列表中的第一个子任务,或者也可以随机选取一个子任务。在一些实施例中,若数据库在发送子任务列表之前,已对子任务列表内的各个子任务进行了优先级排序。则此时服务器可以选取其中优先级最高的子任务进行处理。例如当是按照优先级从高到底的顺序排序时,此时可以选择第一个子任务进行处理。After receiving the subtask list, the server will select a subtask as the subtask to be executed this time. At the same time, after the subtask selection is completed, the server will also download the corresponding subfile from the NSP according to the download address in the subtask to perform subsequent data verification. Wherein, the embodiment of the present application does not limit the selection method of subtasks too much, which can be set by technical personnel according to the actual situation. For example, in some optional embodiments, the first subtask in the subtask list may be selected, or a subtask may be randomly selected. In some embodiments, prior to sending the subtask list, the database has prioritized each subtask in the subtask list. At this time, the server can select the subtask with the highest priority for processing. For example, when the priority is sorted in descending order, the first subtask can be selected for processing.
作为本申请的一个实施例,在确定出此次执行的子任务之后,服务器还会告知数据库该子任务当前处于执行中(可以通过向数据库发送携带有子任务ID以及执行状态的指令的方式实现)。以帮助数据库更新该子任务的执行状态,以及为计算执行时长提供数据。相应的,数据库在获知该子任务被服务器选择执行的消息后。会将该子任务的执行状态设置为:执行中,并将获知此消息的时间设置为该子任务最后更新时间(last_update_time)。而子任务的执行时长则等于当前时间与最后更新时间的差值,即 now()-last_update_time。其中对于原本就为执行中的子任务,此时仅需更新子任务开始执行的时间即可,无需再更改执行状态。As an embodiment of the present application, after determining the subtask to be executed this time, the server will also inform the database that the subtask is currently being executed (this can be achieved by sending an instruction carrying the subtask ID and execution status to the database) ). To help the database update the execution status of the subtask and provide data for calculating the execution time. Correspondingly, the database obtains the message that the subtask is selected to be executed by the server. The execution status of the subtask will be set to: executing, and the time when this message is learned is set to the last update time (last_update_time) of the subtask. The execution time of the subtask is equal to the difference between the current time and the last update time, that is, now()-last_update_time. Among them, for the subtasks that are already in execution, it is only necessary to update the time when the subtasks start to be executed, and there is no need to change the execution status.
作为本申请的一个可选实施例,为了提高对子任务的处理效率,可以采用多个服务器同时对父任务内的各个子任务进行处理。但实际应用中发现,可能会出现多个服务器同时选取同一子任务进行处理的情况。此时会导致对商品数据的处理效率降低。为了防止单个子任务同时被多个服务器重复处理。本申请实施例会引入分布式锁来进行操作。具体而言,参考图2C,此时S106可以被替换为:As an optional embodiment of the present application, in order to improve the processing efficiency of subtasks, multiple servers may be used to process each subtask in the parent task at the same time. However, in practical applications, it is found that multiple servers may select the same subtask for processing at the same time. In this case, the processing efficiency of the commodity data is reduced. To prevent a single subtask from being processed repeatedly by multiple servers at the same time. In this embodiment of the present application, distributed locks are introduced to perform operations. Specifically, referring to FIG. 2C, at this time S106 can be replaced with:
S1061,服务器在接收到子任务列表之后,从子任务列表中不重复的选取一个子任务,并在选取出子任务之后,向缓存组件申请对该子任务的分布式锁。S1061, after receiving the subtask list, the server selects a subtask from the subtask list without repetition, and after selecting the subtask, applies to the cache component for a distributed lock on the subtask.
S1062,若成功获取到对该子任务的分布式锁,则服务器停止对子任务的选取操作,并按照该子任务中的下载地址,从NSP中下载该子任务的子文件。S1062, if the distributed lock of the subtask is successfully acquired, the server stops the selection operation for the subtask, and downloads the subfile of the subtask from the NSP according to the download address in the subtask.
S1063,若未成功获取到对该子任务的分布式锁,则服务器返回执行从子任务列表中不重复的选取一个子任务,并在选取出子任务之后,向缓存组件申请对该子任务的分布式锁的操作。S1063, if the distributed lock of the subtask is not successfully acquired, the server returns to select a subtask that is not repeated from the subtask list, and after the subtask is selected, applies to the cache component for a lock on the subtask Operation of distributed locks.
本申请实施例中,服务器在接收到子任务列表之后,首先会从中选取出一个子任务,并尝试向缓存组件申请对该子任务的分布式锁。其中,服务器可以通过将子任务的标识或ID发送给缓存组件的方式,告知缓存组件此次申请的是哪个子任务的分布式锁。In the embodiment of the present application, after receiving the subtask list, the server first selects a subtask from the subtask, and tries to apply for a distributed lock to the subtask from the cache component. The server may inform the cache component which subtask distributed lock is applying for this time by sending the identifier or ID of the subtask to the cache component.
由于单个子任务的分布式锁仅能分配给单个服务器。因此若该子任务未被其他服务器处理,理论上此时可以获取到对该子任务的分布式锁。反之若该子任务已经被其他服务器处理,基于执行前需要申请分布式锁的原则。此时缓存组件内会记录该子任务已被其他服务器申请分布式锁。因此此时会无法成功获取子任务的分布式锁。基于这一原理,在获取到分布式锁完成上锁的操作后,服务器会判定该子任务为此次所需执行的子任务。并会下载对应的子文件。反之,若获取分布式锁失败,则会重新执行S1061中子任务选取的操作,以重新选取适宜的子任务。Since the distributed lock of a single subtask can only be assigned to a single server. Therefore, if the subtask is not processed by other servers, in theory, the distributed lock for the subtask can be obtained at this time. On the other hand, if the subtask has been processed by other servers, it is based on the principle of applying for a distributed lock before execution. At this time, the cache component will record that the subtask has been applied for a distributed lock by another server. Therefore, the distributed lock of the subtask cannot be successfully acquired at this time. Based on this principle, after obtaining the distributed lock and completing the locking operation, the server will determine that the subtask is the subtask that needs to be executed this time. And will download the corresponding sub-files. On the contrary, if acquiring the distributed lock fails, the operation of subtask selection in S1061 will be re-executed to reselect an appropriate subtask.
本申请实施例不对子任务的选取方法做过多的限定,理论上只需不重复选取即可。例如在一些可选实施例中,可以是随机选取或者顺序选取。作为本申请的一个可选实施例。若数据库在发送子任务列表之前,已对子任务列表内的各个子任务进行了优先级排序。则此时服务器可以按照优先级从高到低的顺序,依次进行子任务选取。例如当是按照优先级从高到底的顺序排序时,此时选取方法可以设置为:从任务列表内未选取过的子任务中,选取出排序最前的一个子任务。本申请实施例可以防止单个子任务长时间未被处理的情况出现,可以提高对子任务处理的效率。The embodiment of the present application does not limit the selection method of subtasks too much, and theoretically, it is only necessary to not repeat the selection. For example, in some optional embodiments, it may be selected randomly or sequentially. as an optional embodiment of the present application. If the database has prioritized each subtask in the subtask list before sending the subtask list. At this time, the server can select subtasks in sequence according to the order of priority from high to low. For example, when sorting is based on the order of priority from high to bottom, the selection method may be set as: from the sub-tasks that have not been selected in the task list, select a sub-task that is ranked first. The embodiment of the present application can prevent the situation that a single subtask is not processed for a long time, and can improve the efficiency of processing the subtask.
另外,为了实现对子任务执行状态的及时更新。本申请实施例中,S1062服务器在获取到分布式锁之后,还会告知给数据库当前子任务处于执行中。以帮助数据库更新该子任务的执行状态,以及执行时长。In addition, in order to realize the timely update of the execution status of subtasks. In this embodiment of the present application, after acquiring the distributed lock, the server S1062 further informs the database that the current subtask is being executed. To help the database update the execution status and execution time of the subtask.
S107,服务器对子文件内的各个商品进行属性数据校验。并将校验通过的商品的属性数据存储至数据库。S107, the server performs attribute data verification on each commodity in the sub-file. And store the attribute data of the products that have passed the verification to the database.
若存在校验未通过的商品,则记录这些商品的属性数据的异常信息,并将异常信息存储至数据库。If there are commodities that have not passed the verification, the abnormal information of the attribute data of these commodities is recorded, and the abnormal information is stored in the database.
在获取到子文件之后,服务器开始对子文件内各个商品的属性数据进行校验。即检查商品的属性数据是否符合预设要求。其中,考虑到不同实际应用场景下对商品属性数据的要求可能会存在一定差异。例如一些可能场景下,为了适应主流终端设备的显示效果,对商品的图片要求较为严格。此时可能会较为严格的要求图片的格式和大小。而在另一些可能场景下,为了为用户提供较为全面的商品数据,可能会对商品属性数据的种类数量要求较高。因此,本申请实施例不对具体的数据校验要求做过多的限定。实际应用中,可由技术人员针对实际应用场景的需求来预设对商品属性数据的要求。再由服务器根据该预设要求,来对子文件内各个商品的属性数据进行校验。After acquiring the sub-file, the server starts to verify the attribute data of each commodity in the sub-file. That is, check whether the attribute data of the product meets the preset requirements. Among them, there may be some differences in the requirements for commodity attribute data in different practical application scenarios. For example, in some possible scenarios, in order to adapt to the display effect of mainstream terminal devices, the requirements for pictures of commodities are relatively strict. At this time, the format and size of the image may be more strictly required. In other possible scenarios, in order to provide users with more comprehensive product data, there may be higher requirements on the types and quantity of product attribute data. Therefore, the embodiments of the present application do not limit the specific data verification requirements too much. In practical applications, the requirements for commodity attribute data can be preset by technical personnel according to the requirements of actual application scenarios. The server then verifies the attribute data of each commodity in the sub-file according to the preset requirement.
另外,对于子文件中包含多个商品的情况,可以选择每次仅对一个商品的属性数据校验的方式,实现对各个商品属性数据的校验。亦可以选择对多个商品进行并发处理,即每次同时对多个商品进行属性数据校验,以提高校验效率。具体可由技术人员根据自行设定,此处不做过多限定。而对于单个商品的各个属性数据的校验规则,此处亦不做过多限定,可由技术人员自行设定。例如在一些可选实施例中,可以设置为依次判断商品的各个属性数据是否满足要求。而在另一些可选实施例中,亦可设置为同时判断多个属性数据是否满足要求。此时可以提高校验效率。In addition, in the case that the sub-file contains multiple commodities, a method of verifying the attribute data of only one commodity at a time can be selected to realize the verification of the attribute data of each commodity. You can also choose to perform concurrent processing on multiple commodities, that is, perform attribute data verification on multiple commodities at the same time each time, so as to improve the verification efficiency. The specific can be set by the technical personnel according to their own, and there is no excessive limitation here. The verification rules for each attribute data of a single product are not limited here, and can be set by technicians. For example, in some optional embodiments, it may be set to sequentially determine whether each attribute data of a commodity meets the requirements. In other optional embodiments, it can also be set to determine whether multiple attribute data meet the requirements at the same time. In this case, the verification efficiency can be improved.
在设置好对商品属性数据要求和校验规则的基础上,本申请实施例会以单个商品为操作对象进行属性数据的校验。因此理论上子文件中每个商品均会有对应的一个校验结果。在本申请实施例中,单个商品的校验结果可分为两类:On the basis of setting requirements for commodity attribute data and verification rules, the embodiment of the present application will take a single commodity as an operation object to verify attribute data. Therefore, in theory, each product in the sub-file will have a corresponding verification result. In the embodiment of the present application, the verification results of a single commodity can be divided into two categories:
1、商品的所有属性数据均校验通过(简称校验通过)。1. All attribute data of the product have been verified (referred to as verified).
2、商品的属性数据中存在异常,使得校验未通过(简称校验未通过)。2. There is an abnormality in the attribute data of the product, so that the verification fails (referred to as verification failure).
对于校验通过的商品,本申请实施例中服务器会将商品的属性数据存储至数据库。由于数据库是以结构化的方式存储数据的,因此将属性数据存储至数据库的过程,即为对属性数据的结构化存储过程。其中,若属性数据中包含商品图片,或者包含商品图片的下载地址。则本申请实施例会将对应的商品图片存储至NSP之中。For commodities that have passed the verification, in this embodiment of the present application, the server will store the attribute data of the commodities in the database. Since the database stores data in a structured manner, the process of storing attribute data to the database is a structured storage process for the attribute data. Among them, if the attribute data includes a product image, or includes the download address of the product image. Then, in this embodiment of the present application, the corresponding commodity picture will be stored in the NSP.
对于校验未通过的商品,服务器则会记录商品对应的异常信息,例如该商品具体是什么属性数据异常。并将异常信息反馈给数据库。以实现对商品属性数据异常的记录,方便CP根据异常信息重新针对性地上传商品属性数据,提高商品数据管理效率。For products that fail the verification, the server will record the abnormal information corresponding to the product, such as what attribute data is abnormal for the product. And feedback the exception information to the database. In order to realize the record of abnormal commodity attribute data, it is convenient for CP to re-upload commodity attribute data according to the abnormal information, and improve the efficiency of commodity data management.
以一实例进行子文件数据校验的举例说明。假设商品数据的格式为数据表,数据表内的每一行,用于记录一个商品的所有属性数据。且要求必须提供的属性数据包括:商品的ID、名称、类目、图片地址和网页链接。同时,假设S102服务器在拆分商品数据时,拆分出的子文件也是数据表格式。且会在子文件内为每个商品生成一个唯一标识,并将该标识作为商品的行号添加至商品的行中。在此基础上,参考图2D,对子文件的数据校验操作,可以包括:S1071-S10710。An example of performing sub-file data verification is given as an example. Assuming that the format of commodity data is a data table, each row in the data table is used to record all attribute data of a commodity. And the attribute data that must be provided include: product ID, name, category, picture address and web page link. At the same time, it is assumed that when the server splits the commodity data in S102, the split sub-file is also in a data table format. And a unique ID will be generated for each item in the subfile, and the ID will be added to the item's row as the item's row number. On this basis, referring to FIG. 2D , the data verification operation on the sub-file may include: S1071-S10710.
S1071,服务器读取子文件内的一行数据,并将读取出的一行数据拆分为第一行号和第一数据。S1071, the server reads one line of data in the sub-file, and splits the read one line of data into a first line number and first data.
在本申请实施例中,每次仅会对单个商品的属性数据进行校验。因此服务器每次会读取出单行数据。作为本申请的一个可选实施例,为了防止多次校验同一商品的属性数据,可以设置为每次均为不重复的读取操作。此时服务器读取子文件内的一行数据的操作可以被替换为:In this embodiment of the present application, only the attribute data of a single commodity is verified each time. So the server will read a single row of data at a time. As an optional embodiment of the present application, in order to prevent the attribute data of the same commodity from being verified multiple times, it may be set as a non-repetitive reading operation each time. At this point, the operation of the server to read a line of data in the subfile can be replaced by:
服务器不重复地读取子文件内的一行数据。The server reads a line of data within the subfile without repetition.
在读取出单个商品的属性数据之后,本申请实施例首先会提取出其中的行号(即第一行号),得到该商品的行号,以及剩余的属性数据。After reading the attribute data of a single commodity, the embodiment of the present application will first extract the row number (ie, the first row number) therein, to obtain the row number of the commodity and the remaining attribute data.
由于本申请实施例中要求必须提供的属性数据包括:商品的ID、名称、类目、图片地址和网页链接。因此若CP按照该要求提供商品数据的话,理论上此时剩余的属性数据即为该商品的ID、名称、类目、图片地址和网页链接。Because the attribute data required to be provided in the embodiment of this application includes: the ID, name, category, picture address and web page link of the product. Therefore, if the CP provides product data according to this requirement, theoretically, the remaining attribute data at this time is the ID, name, category, picture address and web page link of the product.
S1072,服务器判断第一行号是否被处理过。若被处理过,返回执行S1071,以继续处理下一行数据。若未被处理过则执行S1073。S1072, the server determines whether the first line number has been processed. If processed, return to S1071 to continue processing the next line of data. If it has not been processed, S1073 is executed.
考虑到实际应用中,可能会存在一些意外情况,使得单行数据可能会被单个服务器多次处理。例如单行数据被重复划分至多个子文件中,且这些包含相同行数据的子文件被同一服务器处理的场景。又例如S1071中服务器重复读取了同一行数据的场景(此时没有设置不重复读取)。此时会导致单个商品的属性数据重复校验,使得商品数据校验效率降低。为了应对这些意外情况,考虑到行号是商品的唯一标识。因此服务器在解析出商品的行号之后,首先会判定自身是否处理过该行号。Considering practical applications, there may be some unexpected situations, so that a single row of data may be processed multiple times by a single server. For example, a single line of data is repeatedly divided into multiple subfiles, and these subfiles containing the same line of data are processed by the same server. Another example is the scenario in S1071 that the server repeatedly reads the same row of data (in this case, no non-repeat reading is set). In this case, the attribute data of a single product will be repeatedly verified, which reduces the efficiency of product data verification. To deal with these unexpected situations, consider that the line number is the unique identifier of the item. Therefore, after parsing the line number of the product, the server will first determine whether it has processed the line number.
若处理过,说明该商品的属性数据之前已经被校验过了。此时无需再校验。因此会重新选取一行数据,并开始对新选取的行数进行校验。If it has been processed, it means that the attribute data of the product has been verified before. No further verification is required at this time. Therefore, a row of data will be reselected, and the number of newly selected rows will be checked.
若未处理过,则说明需要进行商品属性数据的校验。因此此时会继续执行后续步骤。If it has not been processed, it means that the verification of commodity attribute data is required. So the next steps will continue at this point.
S1073,服务器对第一数据进行属性数据解析。若解析失败,则判定当前行数据异常,将对应的异常信息上传数据库,并返回执行S1071。若解析成功,则得到第一数据内包含的所有属性数据,并执行S1074。S1073, the server performs attribute data analysis on the first data. If the parsing fails, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the execution returns to S1071. If the parsing is successful, all attribute data contained in the first data are obtained, and S1074 is executed.
在行号校验通过后,本申请实施例会开始对第一数据进行属性解析,以判断第一数据的文本格式等是否合法。若可以正常解析并得到商品的ID、名称、类目、图片地址和网页链接。则说明第一数据的文本格式等合法。反之若解析失败,则说明第一数据的文本格式等不合法,无法正常解析还原。对于无法解析的情况,本申请实施例会将数据解析异常对应的异常信息上传至数据库。由数据对异常信息进行记录,以向CP反馈异常情况,帮助CP快速定位存在异常的商品,并重新提供相应商品的属性数据。After the line number verification is passed, the embodiment of the present application will start to perform attribute analysis on the first data to determine whether the text format of the first data is legal. If it can be parsed normally and get the ID, name, category, picture address and web page link of the product. It means that the text format of the first data is legal. On the other hand, if the parsing fails, it means that the text format of the first data is illegal and cannot be parsed and restored normally. In the case of inability to parse, the embodiment of the present application uploads the abnormality information corresponding to the data parsing abnormality to the database. The abnormal information is recorded by the data to feed back the abnormal situation to the CP, help the CP to quickly locate the abnormal product, and re-provide the attribute data of the corresponding product.
其中,本申请实施例不对异常信息的数据类型进行过多限定,可由技术人员根据实际需求设定。例如可以以文本形式描述具体的异常情况,如“数据解析异常”,并将文本作为异常信息。又例如,可以预先针对各种可能的异常情况设置对应的异常码,并将对应的异常码作为异常信息。例如可以设置解析失败对应的异常码为2203,此时异常信息可以是2203。亦可以是异常码加文本的形式作为异常信息。以下各个步骤中的异常信息数据类型亦是如此,本申请实施例不予赘述。Wherein, the embodiment of the present application does not limit too much the data type of the abnormal information, which can be set by a technician according to actual needs. For example, specific abnormal conditions, such as "data parsing abnormality", can be described in text form, and the text can be used as abnormal information. For another example, corresponding exception codes may be set in advance for various possible abnormal situations, and the corresponding exception codes may be used as exception information. For example, the abnormal code corresponding to the parsing failure can be set to 2203, and the abnormal information can be 2203 in this case. It can also be in the form of exception code and text as exception information. The same is true for the abnormal information data types in the following steps, which will not be repeated in this embodiment of the present application.
S1074,对解析得到的各个属性数据进行合法性校验。若存在合法性校验失败的属性数据,则判定当前行数据异常,将对应的异常信息上传数据库,并返回执行S1071。若对各个属性数据的合法性校验均通过,则执行S1075。S1074, verify the validity of each attribute data obtained by parsing. If there is attribute data for which the validity check fails, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the process returns to execute S1071. If the validity check of each attribute data is passed, then execute S1075.
在解析成功之后,本申请实施例会对解析出的ID、名称、类目、图片地址和网页链接分别进行合法性校验。即判断各个属性数据是否存在数据缺失或者数据错误等问题。例如对于类目,假设预先将划分为了:服装、数码家电、鞋子、箱包、家居、玩 具、美妆、配饰、食品和其它类。此时本申请实施例会校验填写的类目数据是否属于这几个分类。若属于,则可以判断对类目合法性校验通过。若不属于,则判定为校验失败。例如假设填写的是“裤子”,此时不属于上述分类,则判定为校验失败。或者填写的是“数码”,此时填写不完整,即存在数据缺失。亦属于校验失败。当校验失败时,本申请实施例会将数据解析异常对应的异常信息上传至数据库。由数据对异常信息进行记录。After the parsing is successful, the embodiments of the present application respectively perform legality verification on the parsed ID, name, category, picture address, and web page link. That is, it is determined whether each attribute data has problems such as missing data or data errors. For example, for the category, it is assumed that it is pre-divided into: clothing, digital appliances, shoes, bags, home, toys, beauty, accessories, food and other categories. At this time, the embodiment of the present application will verify whether the filled-in category data belongs to these categories. If it belongs, it can be judged that the validity check of the category is passed. If not, it is determined that the verification fails. For example, assuming that "pants" is filled in, and it does not belong to the above classification, it is determined that the verification fails. Or fill in "digital", and the filling is incomplete at this time, that is, there is data missing. It is also a verification failure. When the verification fails, the embodiment of the present application uploads the exception information corresponding to the data parsing exception to the database. The abnormal information is recorded by the data.
其中,考虑到属性数据有多种,本申请实施例不对这些属性数据合法性校验的规则进行过多限定。可由技术人员自行设定。例如可以设置为依次对各个属性数据校验,或者同时对多个属性数据校验。并设置在有属性数据合法性校验失败时,停止校验并判定当前行数据异常。Among them, considering that there are many kinds of attribute data, the embodiments of the present application do not limit too many rules for the validity verification of these attribute data. It can be set by technicians. For example, it can be set to verify each attribute data in sequence, or to verify multiple attribute data at the same time. And it is set to stop the verification and determine that the current row data is abnormal when the validity verification of attribute data fails.
作为本申请的一个可选实施例,合法性校验失败对应的异常信息,可以是文本“商品参数校验失败”。或者是异常码2204。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormal information corresponding to the failure of the validity verification may be the text "commodity parameter verification failed". Or exception code 2204. It is also possible to include both.
S1075,根据属性数据中的ID,判断行数据对应的商品是否已存在。若商品已存在,则判定当前行数据异常,将对应的异常信息上传数据库,并返回执行S1071。若商品未存在,则执行S1076。S1075, according to the ID in the attribute data, determine whether the commodity corresponding to the row data already exists. If the commodity already exists, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the execution returns to S1071. If the commodity does not exist, execute S1076.
考虑到实际应用中,若商品较多。CP在整理商品数据时可能会将同一商品的属性数据,多次记录至商品数据。此时可能会导致服务器对同一商品的重复校验,使得校验的效率降低。因此本申请实施例在对属性数据校验完成之后,服务器会根据商品的ID,判断是否判定自身是否处理过该ID的商品。Considering the practical application, if there are many commodities. When the CP organizes the product data, the attribute data of the same product may be recorded to the product data multiple times. In this case, the server may repeatedly verify the same product, which reduces the efficiency of verification. Therefore, after the verification of the attribute data is completed in the embodiment of the present application, the server will, according to the ID of the commodity, determine whether to determine whether it has processed the commodity of the ID.
若处理过,说明该商品的属性数据被重复上传,且商品的属性数据之前已经被校验过了。此时无需再进行后续的校验。因此会重新选取一行数据,并开始对新选取的行数进行校验。且会将行数据异常对应的异常信息上传至数据库。由数据对异常信息进行记录。If it has been processed, it means that the attribute data of the product has been uploaded repeatedly, and the attribute data of the product has been verified before. At this point, no further verification is required. Therefore, a row of data will be reselected, and the number of newly selected rows will be checked. And the abnormal information corresponding to the abnormal row data will be uploaded to the database. The abnormal information is recorded by the data.
若未处理过,则说明需要继续进行该商品的属性数据校验。因此此时会继续执行后续步骤。If it has not been processed, it means that the attribute data verification of the product needs to be continued. So the next steps will continue at this point.
作为本申请的一个可选实施例,商品已存在对应的异常信息,可以是文本“商品已存在”。或者是异常码2202。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormal information corresponding to the commodity already exists, which may be the text "the commodity already exists". Or exception code 2202. It is also possible to include both.
S1076,根据图片地址下载商品图片。若下载失败,则判定当前行数据异常,将对应的异常信息上传数据库,并返回执行S1071。若下载成功则执行S1077。S1076, download the product image according to the image address. If the download fails, it is determined that the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the execution returns to S1071. If the download is successful, execute S1077.
在确认商品未被处理过之后,本申请实施例会尝试根据图片地址下载商品图片。After confirming that the product has not been processed, the embodiment of the present application will try to download the product image according to the image address.
若下载失败,说明图片地址存在问题。如可能是图片地址错误,或者图片源被删除了。此时本申请实施例会将图片地址对应的异常信息上传至数据库。由数据对异常信息进行记录。If the download fails, there is a problem with the image address. For example, the image address may be wrong, or the image source has been deleted. At this time, the embodiment of the present application uploads the abnormal information corresponding to the picture address to the database. The abnormal information is recorded by the data.
若下载成功,说明下载地址无误,此时会继续进行后续校验。If the download is successful, the download address is correct, and the subsequent verification will continue at this time.
作为本申请的一个可选实施例,下载失败对应的异常信息,可以是文本“图片下载失败”。或者是异常码2303。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormal information corresponding to the download failure may be the text "image download failed". Or exception code 2303. It is also possible to include both.
S1077,判断下载的商品图片体积是否超出预设体积阈值。若超出体积阈值,则当前行数据异常,将对应的异常信息上传数据库,并返回执行S1071。若未超出体积阈值,则执行S1078。S1077: Determine whether the volume of the downloaded product image exceeds a preset volume threshold. If the volume threshold is exceeded, the current row data is abnormal, the corresponding abnormal information is uploaded to the database, and the process returns to S1071. If the volume threshold is not exceeded, execute S1078.
为了防止商品图片过大,导致对NSP存储空间占用过多,以及不方便用户下载查看等问题出现。本申请实施例会预先设置一个体积阈值,并要求CP提供的商品图片体积不得超出体积阈值。其中,体积阈值的具体值可由技术人员根据实际需求设定。例如可以设置为2兆。因此在下载商品图片之后,本申请实施例会判定商品图片的体积是否超出体积阈值。In order to prevent the product images from being too large, the NSP storage space is occupied too much, and problems such as being inconvenient for users to download and view occur. In this embodiment of the present application, a volume threshold is preset, and the volume of the commodity image provided by the CP is required not to exceed the volume threshold. Wherein, the specific value of the volume threshold can be set by technical personnel according to actual needs. For example, it can be set to 2 MB. Therefore, after downloading the product image, this embodiment of the present application will determine whether the volume of the product image exceeds the volume threshold.
若超出,则说明图片体积不符合要求。此时服务器会判定图片体积异常,并会将图片体积对应的异常信息上传至数据库。由数据对异常信息进行记录。If it exceeds, it means that the image size does not meet the requirements. At this time, the server will determine that the image volume is abnormal, and upload the abnormal information corresponding to the image volume to the database. The abnormal information is recorded by the data.
若未超出,则说明图片体积符合要求。此时会继续进行后续校验。If it does not exceed, it means that the size of the picture meets the requirements. At this point, subsequent verification will continue.
作为本申请的一个可选实施例,图片体积超出体积阈值对应的异常信息,可以是文本“图片过大”。或者是异常码2305。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormal information corresponding to the image volume exceeding the volume threshold may be the text "image is too large". Or exception code 2305. It is also possible to include both.
S1078,识别商品图片的格式是否属于第一格式。若不属于第一格式,则判定当前行数据异常,记录异常信息,并返回执行S1071。若属于第一格式,则执行S1079。S1078: Identify whether the format of the commodity picture belongs to the first format. If it does not belong to the first format, it is determined that the current row data is abnormal, the abnormal information is recorded, and the execution returns to S1071. If it belongs to the first format, execute S1079.
考虑到实际应用中,图片具有较多种类的格式。但单台服务器和终端设备支持的图片格式往往较为有限。因此为了防止图片格式不支持,导致服务器无法处理商品图片,或者用户无法正常查看商品的图片的情况出现。本申请实施例会继续校验商品图片的格式是否合法。其中,本申请实施例会预先设置一种或多种格式作为合法格式(即第一格式)。此时即为识别商品图片的格式是否属于合法格式。若属于则判定为校验通过。若不属于,则判定为校验失败,图片商品图片格式异常。并会将图片格式对应的异常信息上传至数据库。由数据对异常信息进行记录。Taking into account practical applications, pictures have a wide variety of formats. However, the image formats supported by a single server and terminal device are often limited. Therefore, in order to prevent the situation that the image format is not supported, the server cannot process the image of the product, or the user cannot view the image of the product normally. This embodiment of the present application will continue to verify whether the format of the product image is legal. Wherein, in this embodiment of the present application, one or more formats are preset as legal formats (ie, the first format). At this time, it is to identify whether the format of the product image is a legal format. If it belongs, it is judged that the verification is passed. If it does not belong, it is determined that the verification fails, and the image format of the image product is abnormal. And the abnormal information corresponding to the image format will be uploaded to the database. The abnormal information is recorded by the data.
例如,假设设置合法格式包括:jpg、png、bmp和gif。此时若商品图片格式属于这几种内的任意一种,则认为格式合法。反正若不是其中的格式,则认为商品图片格式异常,校验失败。For example, assume that legal formats are set to include: jpg, png, bmp, and gif. At this time, if the product image format belongs to any of these categories, the format is considered legal. Anyway, if it is not in the format, it is considered that the format of the product image is abnormal and the verification fails.
作为本申请的一个可选实施例,图片格式不属于合法格式对应的异常信息,可以是文本“不支持的图片格式”。或者是异常码2304。亦可以是同时包含两者。As an optional embodiment of the present application, the picture format does not belong to the abnormal information corresponding to the legal format, and may be the text "unsupported picture format". Or exception code 2304. It is also possible to include both.
S1079,将商品图片上传NSP。若上传失败,则判定图片上传异常,记录异常信息,并返回执行S1071。若上传成功,则执行S10710。S1079, upload the product image to the NSP. If the upload fails, it is determined that the image upload is abnormal, the abnormal information is recorded, and the process returns to S1071. If the upload is successful, execute S10710.
在完成对商品图片体积和格式等校验之后,本申请实施例会将商品图片上传至NSP进行存储。After completing the verification of the volume and format of the commodity picture, the embodiment of the present application uploads the commodity picture to the NSP for storage.
当上传失败时,说明服务器和NSP之间的数据传输出现了问题。例如可能是NSP损坏,或者两者之间的网络不稳定。此时本申请实施例会将图片上传失败对应的异常信息上传至数据库。由数据对异常信息进行记录。When the upload fails, there is a problem with the data transmission between the server and the NSP. For example it could be that the NSP is broken, or the network between the two is unstable. At this time, the embodiment of the present application will upload the abnormal information corresponding to the failure to upload the picture to the database. The abnormal information is recorded by the data.
若上传成功,则继续进行后续操作。If the upload is successful, continue with subsequent operations.
作为本申请的一个实施例,为了提高上传成功的概率。若上传失败,可以多次尝试上传。若多次上传均失败,再判定图片上传异常。As an embodiment of the present application, in order to improve the probability of successful uploading. If the upload fails, you can try uploading multiple times. If multiple uploads fail, then determine that the image upload is abnormal.
作为本申请的一个可选实施例,图片上传失败对应的异常信息,可以是文本“系统异常”。或者是异常码1001。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormal information corresponding to the image upload failure may be the text "system abnormality". Or exception code 1001. It is also possible to include both.
S10710,将第一数据上传数据库。若上传失败,则判定数据入库异常,记录异常信息,并返回执行S1071。若上传成功,则判定当前行数据校验通过,并返回执行S1071。S10710, upload the first data to the database. If the upload fails, it is determined that the data storage is abnormal, the abnormal information is recorded, and the execution returns to S1071. If the upload is successful, it is determined that the current row data verification is passed, and the process returns to execute S1071.
在完成商品图片上传之后,本申请实施例会将商品的属性数据上传数据库。After uploading the image of the product, the embodiment of the present application uploads the attribute data of the product to the database.
当上传失败时,说明服务器和数据库之间的数据传输出现了问题。例如可能是两者之间的网络不稳定。此时本申请实施例会将属性数据上传失败对应的异常信息上传至数据库。由数据对异常信息进行记录。同时会从子文件中重新选取一个商品作为对象进行属性数据的校验。其中,由于异常信息的数据量较小,相对属性数据而言,上传数据库成功的概率较大。When the upload fails, there is a problem with the data transfer between the server and the database. For example, it may be that the network between the two is unstable. At this time, the embodiment of the present application uploads the abnormal information corresponding to the failure to upload the attribute data to the database. The abnormal information is recorded by the data. At the same time, a commodity will be re-selected from the sub-file as the object to verify the attribute data. Among them, due to the small amount of abnormal information, compared with attribute data, the probability of uploading to the database is relatively high.
若上传成功,则完成对当前校验商品的属性数据入库操作。此时会从子文件中重新选取一个商品作为对象进行属性数据的校验。If the upload is successful, the storage operation of the attribute data of the currently verified product is completed. At this time, a commodity will be re-selected from the sub-file as the object to verify the attribute data.
作为本申请的一个可选实施例,属性数据上传失败对应的异常信息,可以是文本“系统异常”。或者是异常码1001。亦可以是同时包含两者。As an optional embodiment of the present application, the abnormality information corresponding to the failure to upload the attribute data may be the text "system abnormality". Or exception code 1001. It is also possible to include both.
经由S1071-S10710的操作,可以实现对子文件内各个行数据的逐一处理。进而实现对子文件内各个商品属性数据的校验。相应的,若在S1073-S10710任意步骤中被判定为异常(包括行数据异常、图片上传异常和数据入库异常),则本申请实施例均会判定当前校验的商品的属性数据中存在异常。即当前校验的商品校验未通过。若S1073-S10710均校验成功,则会判定当前校验的商品校验通过。Through the operations of S1071-S10710, the data of each line in the sub-file can be processed one by one. Further, the verification of the attribute data of each commodity in the sub-file is realized. Correspondingly, if it is determined to be abnormal (including abnormal row data, abnormal image upload, and abnormal data storage) in any step of S1073-S10710, the embodiment of the present application will determine that there is abnormality in the attribute data of the currently verified product. . That is, the current verification product verification fails. If the verification of S1073-S10710 is successful, it will be determined that the current verification of the commodity has passed the verification.
作为本申请的一个实施例,为了在完成对子文件内各个商品的属性数据校验之后,可以及时停止对当前子文件的校验。使得服务器可以继续执行其他任务。在S1071之后,还包括:As an embodiment of the present application, after completing the verification of the attribute data of each commodity in the sub-file, the verification of the current sub-file can be stopped in time. Allows the server to continue performing other tasks. After S1071, also include:
S10711,若子文件内所有的行数据均已被读取过,则判定对子文件校验完成。S10711, if all the line data in the sub-file has been read, it is determined that the verification of the sub-file is completed.
在本申请实施例中,通过对行号、属性数据格式、属性数据合法性、ID、商品图片下载、商品图片体积、商品图片上传和属性数据入库依次进行校验。进而实现了对商品属性数据完整可靠的校验。同时还实现了对属性数据的入库,以及对异常信息的准确记录。In the embodiment of the present application, the verification is performed sequentially by checking the line number, attribute data format, attribute data validity, ID, product image download, product image volume, product image upload, and attribute data storage. In this way, the complete and reliable verification of commodity attribute data is realized. At the same time, it also realizes the storage of attribute data and the accurate recording of abnormal information.
作为本申请的一个可选实施例,在向数据库存储属性数据或者异常信息时,可以同时向数据库发送处理商品的唯一标识,例如行号。此时,若在对子文件校验的过程中,由于意外因素导致子文件无法校验子任务执行超时。其他服务器在执行该子任务时,就可以根据行号从上一次校验的商品处继续进行校验。进而提高校验的效率,减少对商品属性数据重复校验的工作。As an optional embodiment of the present application, when attribute data or exception information is stored in the database, a unique identifier of the processed commodity, such as a line number, may be sent to the database at the same time. At this time, if the sub-file cannot be verified due to unexpected factors during the verification of the sub-file, the execution of the sub-task is timed out. When other servers execute this subtask, they can continue to verify from the last verified commodity according to the line number. Further, the efficiency of verification is improved, and the work of repeated verification of commodity attribute data is reduced.
作为本申请的一个可选实施例,若结合图2C所示实施例,采用对子任务申请分布式锁的方式防止单个子任务被多次执行。此时为了防止服务器出现故障,导致子任务被自身长时间占据,使得对子任务执行效率降低。本申请实施例中,提供两种可选的应对方式:As an optional embodiment of the present application, if combined with the embodiment shown in FIG. 2C , a method of applying for a distributed lock to a subtask is used to prevent a single subtask from being executed multiple times. At this time, in order to prevent the server from malfunctioning, the subtask is occupied by itself for a long time, which reduces the execution efficiency of the subtask. In this embodiment of the present application, two optional coping methods are provided:
1、服务器在获取到针对子任务的分布式锁后,会开始计时。当计时时长达到时长阈值,且子任务仍没有执行完成时。服务器会主动告知缓存组件释放对该子任务的分布式锁。此时其他服务器可以再次申请对该子任务的分布式锁,并处理该子任务。1. After the server acquires the distributed lock for the subtask, it will start timing. When the timing duration reaches the duration threshold and the subtask is still not completed. The server will actively inform the cache component to release the distributed lock on the subtask. At this time, other servers can apply for the distributed lock of the subtask again and process the subtask.
2、缓存组件在将子任务对应的分布式锁分配给服务器后,会开始计时。当计时时长达到时长阈值时,会主动释放对该子任务的分布式锁。即对该子任务分布式锁的强制解锁。此时任意服务器均可以再次申请对该子任务的分布式锁,并处理该子任务。2. The cache component will start timing after allocating the distributed lock corresponding to the subtask to the server. When the timing duration reaches the duration threshold, the distributed lock on the subtask will be released actively. That is, the forced unlocking of the distributed lock of the subtask. At this time, any server can apply for the distributed lock of the subtask again and process the subtask.
实际应用中,技术人员可以选取上述2中应对方式中的任意一种或两种方式进行应用,实现对子任务超时的分布式锁自动解锁。使得单个子任务在执行异常时,可以 及时自动释放,并由其他服务器结果。实现了对子任务的自动节点接管。进而使得对子任务执行的可靠性大大增强。In practical applications, technicians can choose any one or both of the above-mentioned coping methods to apply, to realize automatic unlocking of distributed locks whose subtasks have timed out. So that a single subtask can be automatically released in time when the execution is abnormal, and the result can be obtained by other servers. Implemented automatic node takeover of subtasks. In turn, the reliability of subtask execution is greatly enhanced.
作为本申请的另一个可选实施例。考虑到实际应用中,当子任务数据量较大时,执行完子任务所需耗时较长。当该所需耗时长于时长阈值时,若以数据库获知子任务被服务器选择执行的时间作为最后更新时间,会出现虽然子任务正在被正常执行,但却被判定为执行超时的情况。As another optional embodiment of the present application. Considering the practical application, when the amount of subtask data is large, it takes a long time to complete the subtask. When the required time is longer than the duration threshold, if the database learns the time when the subtask is selected to be executed by the server as the last update time, there will be a situation where the subtask is being executed normally, but it is judged to be overtime.
例如,假设正常执行子任务A需要6分钟,时长阈值设置为5分钟。同时假设数据库获知子任务A在12点整开始被服务器执行。此时若以仍以12点整作为子任务A的最后更新时间。会导致在12点5分后虽然子任务A正在被服务器正常执行,但却被数据库认为子任务A执行已超时。而超时又会导致子任务可能被多个服务器重复执行,进而导致处理效率降低。For example, suppose it takes 6 minutes to execute subtask A normally, and the duration threshold is set to 5 minutes. At the same time, it is assumed that the database knows that subtask A starts to be executed by the server at 12:00. At this time, 12:00 is still the last update time of subtask A. As a result, after 12:5, although the subtask A is being executed normally by the server, the database considers that the execution of subtask A has timed out. The timeout will cause the subtask to be executed repeatedly by multiple servers, which will reduce the processing efficiency.
为了防止数据库误判子任务执行超时。本申请实施例中,数据库在接收到商品的属性数据或者异常信息之后,一方面会对接收到的属性数据或者异常信息进行存储。以实现对属性数据的入库以及异常信息的记录。另一方面还会将接收到属性数据或者异常信息的时间,更新为该子任务的最后更新时间。从而实现对子任务执行时长的更新。In order to prevent the database from misjudging the subtask execution timeout. In the embodiment of the present application, after receiving the attribute data or abnormal information of the commodity, the database stores the received attribute data or abnormal information on the one hand. In order to realize the storage of attribute data and the recording of abnormal information. On the other hand, the time when attribute data or exception information is received will be updated to the last update time of the subtask. In this way, the execution time of the subtask can be updated.
另外,作为本申请的一个可选实施例,为了方便向CP反馈商品数据的情况。在本申请实施例中,数据库针对每个父任务均可以创建一个任务细节(TaskDetai)表。并会在接收到异常信息时,将异常信息记录至任务细节表。在对商品数据全部子文件均校验完成之后,任务细节表中即完成了对商品数据对应的所有异常信息的记录。CP可以根据任务细节表明确得知商品数据中哪个商品的那些熟悉数据存在异常。并依此重新提供对应的属性数据。以提高对商品管理的效率。In addition, as an optional embodiment of the present application, for the convenience of feeding back commodity data to the CP. In this embodiment of the present application, the database may create a task detail (TaskDetail) table for each parent task. And when the abnormal information is received, the abnormal information will be recorded in the task detail table. After the verification of all sub-files of the commodity data is completed, the record of all abnormal information corresponding to the commodity data is completed in the task detail table. According to the task detail table, the CP can clearly know which commodity in the commodity data has an abnormality in the familiar data. And accordingly provide the corresponding attribute data again. To improve the efficiency of commodity management.
作为本申请对子任务进行处理的一个可选实施例,对子任务进行处理的整体流程可以参考图2E。在本申请实施例中,采用了多个服务器负责对子任务进行并发处理。同时为了防止单个子任务被多个服务器重复处理,还引入了分布式锁。详述如下:As an optional embodiment of processing subtasks in the present application, reference may be made to FIG. 2E for the overall process of processing subtasks. In the embodiment of the present application, multiple servers are used to perform concurrent processing on subtasks. At the same time, in order to prevent a single subtask from being repeatedly processed by multiple servers, distributed locks are also introduced. Details are as follows:
各个服务器向数据库获取子任务,并同步申请针对子任务的分布式锁。即进行抢锁。Each server obtains subtasks from the database and simultaneously applies for distributed locks for the subtasks. That is, to grab the lock.
抢锁成功的服务器,会作为S104-S107的执行主体(由于是多服务器并发处理子任务,因此针对每个子任务而言,作为S104-S107执行主体的服务器可以相同或不同),从NSP中下载子任务对应的子文件。The server that successfully grabs the lock will be the execution body of S104-S107 (because it is a multi-server concurrent processing subtask, so for each subtask, the server that is the execution body of S104-S107 can be the same or different), download it from NSP The subfile corresponding to the subtask.
在下载子文件后,对子文件进行校验。After downloading the subfile, verify the subfile.
在校验的过程中,下载商品图片,并将商品图片存储至NSP。同时还会在校验的过程中将子文件内商品的属性数据存储至数据库。从而实现对商品图片和属性数据的并发存储处理。During the verification process, the product image is downloaded and stored in the NSP. At the same time, the attribute data of the commodities in the sub-file will be stored in the database during the verification process. In this way, the concurrent storage and processing of commodity pictures and attribute data is realized.
S108,服务器在将子文件内所有校验通过的商品的属性数据存储至数据库之后,判定对该子任务执行完成。并向数据库发送对子任务的状态更新指令,以将数据库中该子任务的执行状态更新为执行完成。S108: After storing the attribute data of all the commodities in the sub-file that have passed the verification in the database, the server determines that the execution of the sub-task is completed. And send a state update instruction to the subtask to the database, so as to update the execution state of the subtask in the database to be executed.
在得到单个商品的校验结果时(校验通过和校验未通过均视为得到校验结果),本申请实施例会判定对该商品的校验完成。在对子任务内所有商品均校验完成之后,服 务器会向数据库发送状态更新指令,告知数据库该子任务执行完成。数据库在接收到状态更新指令之后,会将该子任务的执行状态更新为执行完成。其中,状态更新指令内具体包含的内容此处不做过多限定。When the verification result of a single commodity is obtained (both the verification pass and the non-pass verification are regarded as obtaining the verification result), the embodiment of the present application will determine that the verification of the commodity is completed. After all commodities in the subtask are verified, the server will send a status update instruction to the database to inform the database that the subtask execution is completed. After receiving the status update instruction, the database will update the execution status of the subtask to execution completed. The content specifically included in the state update instruction is not limited here.
对应于校验通过和校验未通过两种校验结果,在本申请实施例中,子任务执行完成至少包括两种情况:Corresponding to the two verification results of passing the verification and failing to pass the verification, in this embodiment of the present application, the completion of the execution of the subtask includes at least two cases:
1、子任务执行完成,且子任务内所有商品的属性数据均校验通过。1. The execution of the subtask is completed, and the attribute data of all commodities in the subtask have been verified.
2、子任务执行完成,但子任务内有商品存在属性数据异常,即存在商品属性数据校验未通过。2. The execution of the subtask is completed, but there are items in the subtask with abnormal attribute data, that is, the attribute data verification of the item fails.
作为本申请的一个可选实施例。对于执行完成,且所有商品的属性数据均校验通过的子任务。可以选择在数据库中删除该子任务,以节省数据库存储空间。as an optional embodiment of the present application. For subtasks that have been executed and the attribute data of all products have been verified. You can optionally delete this subtask in the database to save database storage space.
作为本申请的一个可选实施例,若结合图2C所示实施例,采用对子任务申请分布式锁的方式防止单个子任务被多次执行。本申请实施例在判定对当前子任务执行完成之后,服务器会释放对该子任务的分布式锁。As an optional embodiment of the present application, if combined with the embodiment shown in FIG. 2C , a method of applying for a distributed lock to a subtask is used to prevent a single subtask from being executed multiple times. In this embodiment of the present application, after determining that the execution of the current subtask is completed, the server releases the distributed lock on the subtask.
S109,服务器在执行完成该子任务之后,继续向数据库发送任务查询请求。S109, the server continues to send a task query request to the database after completing the subtask.
服务器在执行完成当前子任务之后,会开始继续处理下一个子任务。因此此时会返回执行S104,重新向数据库发送任务查询请求。After the server completes the current subtask, it will continue to process the next subtask. Therefore, it will return to execute S104 at this time, and send the task query request to the database again.
S110,数据库在接收到任务查询请求后,识别父任务内各个子任务的执行状态。若父任务中所有子任务均执行完成,判定对商品数据入库结束,生成入库结果,并将入库结果发送至服务器。S110: After receiving the task query request, the database identifies the execution status of each subtask in the parent task. If all subtasks in the parent task are completed, it is determined that the storage of commodity data is completed, the storage result is generated, and the storage result is sent to the server.
S111,服务器将入库结果反馈给CP终端。S111, the server feeds back the storage result to the CP terminal.
若存在未执行完成,例如有未执行的或者有执行中的。则此时执行S105的步骤。If there is unexecuted completion, for example, there is unexecuted or there is execution. Then, the step of S105 is executed at this time.
在本申请实施例中,数据库在接收到任务查询请求之后,会识别父任务下各个子任务的执行状态。与父任务和子任务刚在数据库被创建时不同,此时已经有至少一个服务器执行过父任务下的子任务。因此对于父任务而言,存在两种可能的情况:In the embodiment of the present application, after receiving the task query request, the database identifies the execution status of each subtask under the parent task. Unlike the parent task and the child task when the database is created, at least one server has already executed the child task under the parent task. So for the parent task, there are two possible cases:
1、父任务下所有子任务均执行完成。1. All subtasks under the parent task are executed and completed.
2、父任务下仍有没执行完成的子任务(包括未执行的和执行中的)。2. There are still unexecuted subtasks (including unexecuted and executing) under the parent task.
对于父任务下仍有没执行完成的子任务,此时则需执行S105的操作,进行待执行的子任务。并执行S105-S109对应的操作。For subtasks that have not yet been executed under the parent task, at this time, the operation of S105 needs to be performed to execute the subtasks to be executed. And perform operations corresponding to S105-S109.
对于父任务下所有子任务均执行完成的情况,此时说明对CP上传的商品数据全部完成了处理。其中,对于校验通过商品的属性数据,完成处理是指完成对属性数据的入库。对校验未通过商品的属性数据,完成处理则是指在数据库中记录了对应的异常信息。因此此时本申请实施例会判定当前对商品数据入库结束。In the case where all subtasks under the parent task are executed and completed, it means that all the commodity data uploaded by the CP have been processed. Among them, for the attribute data of the commodity that has passed the verification, the completion of the processing refers to the completion of the storage of the attribute data. For the attribute data of the products that have not passed the verification, the completion of the processing means that the corresponding abnormal information is recorded in the database. Therefore, at this time, the embodiment of the present application will determine that the current storage of commodity data is completed.
在确定入库结束之后,需要告知CP入库情况。因此入库结束后,数据库会确定父任务下各个子任务的实际执行情况,并生成对应的入库结果。其中,入库情况可能有以下几种:After it is determined that the warehousing is over, the CP needs to be informed of the warehousing situation. Therefore, after the warehousing is completed, the database will determine the actual execution of each subtask under the parent task, and generate the corresponding warehousing result. Among them, the storage situation may include the following:
1、商品数据内所有商品的属性数据均成功入库。此时可以将对应的入库结果设置为商品数据入库成功。1. The attribute data of all products in the product data have been successfully stored. At this point, the corresponding warehousing result can be set as the commodity data warehousing success.
2、商品数据内,存在商品的属性数据异常。数据库记录了对应的异常信息。此时可以将对应的入库结果设置为部分商品数据入库成功,同时将记录的异常信息作为入 库结果的一部分。当采用任务细节表记录异常信息时,则会将任务细节表作为入库结果的一部分内容。2. In the product data, there is an abnormality in the attribute data of the product. The database records the corresponding exception information. At this time, the corresponding warehousing result can be set as the successful warehousing of some commodity data, and the recorded abnormal information can be regarded as part of the warehousing result. When the task detail table is used to record abnormal information, the task detail table will be used as part of the storage result.
在得到入库结果之后,数据库再将入库结果发送至服务器。由服务器将接收到的入库结果反馈给CP终端。最后再由CP终端将入库结果展示给CP查看。After getting the warehousing result, the database sends the warehousing result to the server. The server will feed back the received storage result to the CP terminal. Finally, the CP terminal will display the storage results to the CP for viewing.
其中,S110的操作,既可以是数据库自身完成,也可以是数据库所处终端设备完成。具体可由参考S105中的相关说明。Wherein, the operation of S110 may be completed by the database itself, or may be completed by the terminal device where the database is located. For details, refer to the relevant description in S105.
对于商品数据内所有商品的属性数据均成功入库的情况。此时CP实现了对商品数据的有效上传。而对于商品数据内,存在商品的属性数据异常的情况。此时CP可以查看入库结果中的异常信息(如有任务细节表,则可以直接查看任务细节表)。可以根据异常信息来确定存在异常的商品,并重新整理或检查这些异常商品的属性数据。再将这些属性数据作为新的商品数据,重新上传至NSP,以对异常商品的商品数据重新尝试入库。For the case where the attribute data of all products in the product data are successfully stored. At this time, the CP realizes the effective upload of the commodity data. In the commodity data, there is an abnormality in the attribute data of the commodity. At this time, the CP can view the abnormal information in the storage result (if there is a task details table, you can directly view the task details table). The abnormal products can be determined according to the abnormal information, and the attribute data of these abnormal products can be rearranged or checked. These attribute data are then re-uploaded to the NSP as new product data, so as to retry the storage of the product data of the abnormal product.
在本申请实施例中,CP只需按照一定格式要求提供商品数据,该商品数据可以是结构化或非结构化的数据。当选用非结构化的商品数据时,CP可不对商品数据进行结构化处理。商品管理系统在接收到商品数据之后,会对商品数据进行数据拆分,得到多个子文件,并为各个子文件分别创建对应的子任务。随后利用一个或多个服务器,对各个子任务进行数据校验,并同步将子文件内的商品数据存储至数据库,将相应的商品图片存储至网络存储平台。使得入库的效率更高,实现了对商品数据的高效管理。同时,对属性数据逐步入库的操作,即为对商品数据结构化入库的操作。因此无论商品数据是结构化或非结构化的数据,本申请实施例均可以实现对商品数据的结构入库。In this embodiment of the present application, the CP only needs to provide commodity data according to certain format requirements, and the commodity data may be structured or unstructured data. When unstructured commodity data is selected, CP may not perform structured processing on commodity data. After receiving the commodity data, the commodity management system splits the commodity data to obtain multiple sub-files, and creates corresponding sub-tasks for each sub-file. Then, one or more servers are used to perform data verification on each sub-task, and synchronously store the commodity data in the sub-files to the database, and store the corresponding commodity pictures to the network storage platform. This makes warehousing more efficient and realizes efficient management of commodity data. At the same time, the step-by-step warehousing operation of attribute data is the operation of structured warehousing of commodity data. Therefore, regardless of whether the commodity data is structured or unstructured data, the embodiments of the present application can implement the structured storage of commodity data.
本申请实施例的数据管理过程中,CP只需按照格式要求提供商品数据,即可实现对商品数据的离线导入数据库(简称离线导入,离线是指用户上传后无需在线操作)。其中,CP可以不进行数据结构化处理操作。由于实际应用中CP原本就需要整理商品数据(无论是出于库存整理还是上架电商平台等目的,实际应用中CP一般都是需要整理商品数据的),因此对CP而言,只需要将商品数据按照格式要求整理即可,无需付出过多额外的工作。相对现有技术而言,本申请实施例大大降低了CP操作的技术门槛,可用性更高。同时对商品数据自动化的校验和数据存储,也极大地提升了对商品数据的管理效率。In the data management process of the embodiment of the present application, the CP only needs to provide the commodity data according to the format requirements, and can realize the offline import of the commodity data into the database (abbreviated as offline import, offline means that the user does not need to operate online after uploading). The CP may not perform data structuring operations. Since CP originally needs to sort out commodity data in practical applications (whether for the purpose of inventory sorting or listing on e-commerce platforms, CP generally needs to sort out commodity data in practical applications), so for CP, only the commodity data needs to be sorted out. The data can be organized according to the format requirements without too much extra work. Compared with the prior art, the embodiment of the present application greatly reduces the technical threshold of CP operation, and has higher usability. At the same time, the automatic verification and data storage of commodity data also greatly improves the management efficiency of commodity data.
另外,当结合分布式锁进行应用时。通过分布式锁的特点,在使用多个服务器对子任务进行并发处理时,不用担心多个服务器同时处理同一子任务,导致对子任务处理效率过低的情况。因此本申请实施例可以实现对子任务高并发且高效的处理。而在分布式锁上锁时间过长时,服务器和缓存组件均会对子任务自动解锁。使得子任务可以被其他服务器重新申请上锁和处理,实现了对子任务的节点管理以及自动托管。此时可以防止服务器由于故障等原因,导致服务器无法正常处理子任务,使得子任务长时间无法正常执行的情况出现。使得对子任务处理的可靠性更高。Also, when applied in conjunction with distributed locks. Through the characteristics of distributed locks, when multiple servers are used to process subtasks concurrently, there is no need to worry about multiple servers processing the same subtask at the same time, resulting in inefficient processing of subtasks. Therefore, the embodiments of the present application can implement highly concurrent and efficient processing of subtasks. When the distributed lock is locked for too long, both the server and the cache component will automatically unlock the subtasks. This enables subtasks to be re-applied for locking and processing by other servers, and realizes node management and automatic hosting of subtasks. At this time, it can prevent the server from being unable to process the subtasks normally due to reasons such as failure of the server, so that the subtasks cannot be executed normally for a long time. This makes the processing of subtasks more reliable.
最后,利用商品数据拆分、支持多服务器并发处理和子任务异常自动托管等技术,本申请实施例可以实现对大批量商品数据的有效处理。因此本申请实施例可以支持对大任务场景的有效处理。而通过对商品属性数据异常信息的分析和反馈,可以实现对任务失败的详情展示,有利于CP针对性地补充异常的属性数据。提高了CP的操作效 率。Finally, by utilizing technologies such as commodity data splitting, support for multi-server concurrent processing, and automatic hosting of subtask exceptions, the embodiments of the present application can effectively process large quantities of commodity data. Therefore, the embodiments of the present application can support effective processing of large task scenarios. Through the analysis and feedback of the abnormal information of commodity attribute data, the detailed display of task failure can be realized, which is beneficial to CP to supplement abnormal attribute data in a targeted manner. The operating efficiency of the CP is improved.
作为本申请的一个可选实施例,S107中服务器对子文件进行属性数据校验的操作,是在校验的过程中同步实现对属性数据的入库和异常信息的记录。实际应用中,亦可以是先对子文件进行属性数据校验。并在对子文件校验完成后,再将子文件内属性数据和异常信息存储至数据库。参考图3A,此时S107可以被替换为:As an optional embodiment of the present application, in S107 , the operation of the server performing attribute data verification on the sub-file is to synchronously implement the storage of attribute data and the recording of abnormal information during the verification process. In practical applications, attribute data verification may also be performed on the sub-files first. And after the sub-file verification is completed, the attribute data and abnormal information in the sub-file are stored in the database. Referring to FIG. 3A, at this time S107 can be replaced with:
S201,服务器对子文件内的各个商品进行属性数据校验。S201, the server performs attribute data verification on each commodity in the sub-file.
S202,若对子文件校验完成,则将所述子文件中校验通过的商品的属性数据存储至数据库。对校验未通过的商品,则将记录这些商品的属性数据的异常信息,并将异常信息存储至数据库。S202, if the verification of the sub-file is completed, store the attribute data of the commodity in the sub-file that has passed the verification to a database. For commodities that fail the verification, the abnormal information of the attribute data of these commodities will be recorded, and the abnormal information will be stored in the database.
其中,具体对子文件的数据校验操作原理和细节等说明,均可以参考图2A所示实施例的相关说明。此处不予赘述。For the specific description of the operation principle and details of the data verification of the sub-file, reference may be made to the relevant description of the embodiment shown in FIG. 2A . It will not be repeated here.
需要说明地,在本申请实施例中,服务器在完成对子文件内各个商品的属性数据的校验之后。对校验通过的商品,其所有的属性数据均进行入库处理。而对于校验未通过的商品。则会记录属性数据的异常信息。并会将所有异常信息,一并发送至数据库。其中,对异常信息的说明,可以参考S107的相关说明,此处不予赘述。It should be noted that, in this embodiment of the present application, the server completes the verification of the attribute data of each commodity in the sub-file. For the commodities that pass the verification, all the attribute data are put into the warehouse. And for the products that have not passed the verification. The exception information of the attribute data will be recorded. And all abnormal information will be sent to the database together. For the description of the abnormal information, reference may be made to the relevant description of S107, which will not be repeated here.
应当特别说明地,本申请实施例可以和图2D所示实施例进行结合应用。此时本申请实施例首先会执行S1071-S10711的操作。并会在对子文件校验完成后,再次将属性数据和异常信息存储至数据库。It should be particularly noted that the embodiment of the present application may be applied in combination with the embodiment shown in FIG. 2D . At this time, the embodiment of the present application will first perform the operations of S1071-S10711. And after the sub-file verification is completed, the attribute data and exception information will be stored in the database again.
针对图2A所示实施例(以下称为实施例a)和图3A所示实施例(以下称为实施例b)进行对比分析。对于子文件的校验,在实施例a中,对单个子文件是边校验边进行商品属性数据入库,且每次均是以单个商品的属性数据为对象进行校验和入库。而实施例b中,对单个子文件则是全部校验完成之后才进行商品属性数据的入库。两个实施例之间存在以下几点差异:A comparative analysis is performed on the embodiment shown in FIG. 2A (hereinafter referred to as the embodiment a) and the embodiment shown in FIG. 3A (hereinafter referred to as the embodiment b). For the verification of sub-files, in Embodiment a, commodity attribute data is stored in the warehouse while verifying a single sub-file, and each time the attribute data of a single commodity is used as the object for verification and storage. In the embodiment b, the commodity attribute data is stored in the warehouse only after all the verification of the single sub-file is completed. There are several differences between the two embodiments:
1、校验精细度。在实施例a中,服务器每次均是以单个商品的属性数据为对象进行校验和入库。因此每次操作精细度为单个商品级别。而在实施例b之中,服务器则是在单个子文件校验完成之后才进行商品属性数据的入库。因此精细度是单个子文件级别。1. Check the precision. In Embodiment a, the server checks and stores the attribute data of a single commodity every time. Therefore, the granularity of each operation is at the level of a single item. In the embodiment b, the server performs the storage of commodity attribute data only after the verification of a single sub-file is completed. So the granularity is at the individual subfile level.
由于单个子文件之中往往会包含较多商品的属性数据。因此实施例a的校验精细度要高于实施例b。Because a single sub-file often contains more attribute data of products. Therefore, the verification precision of the embodiment a is higher than that of the embodiment b.
2、校验耗时,以及对网络资源的耗费。由于单个子文件中往往会包含商品的属性数据。因此在实施例a对单个子文件的校验过程中,服务器需要多次与数据库进行数据交互。这使得实施例a需要耗费较多的网络资源,且对服务器与数据库之间的网络连接质量要求较高。此外,多次数据交互,也会增加对子文件校验的耗时。因此与实施例b相比,实施例a的校验耗时较长、对网络资源的耗费较高,且对网络连接的质量要求也较高。2. Time-consuming verification and consumption of network resources. Because a single subfile often contains product attribute data. Therefore, in the verification process of a single sub-file in Embodiment a, the server needs to perform data interaction with the database multiple times. This causes the embodiment a to consume more network resources, and has higher requirements on the quality of the network connection between the server and the database. In addition, multiple data interactions will also increase the time-consuming of sub-file verification. Therefore, compared with the embodiment b, the verification of the embodiment a takes longer time, consumes higher network resources, and has higher requirements on the quality of the network connection.
3、对服务器异常情况的应对。实际应用中,服务器在校验单个子文件的过程中,可能会出现掉电和宕机等异常情况。此时服务器可能会中止对子文件的校验。3. Response to server exceptions. In practical applications, in the process of verifying a single sub-file, the server may experience abnormal situations such as power failure and downtime. At this point the server may abort the verification of the subfile.
针对实施例a而言,服务器理论上可以做到单个商品级别的属性数据校验操作和入库操作同步。因此在对子文件校验的过程中,数据库内也会同步存储对子文件内各 个商品的属性数据或者异常信息。在此基础上,若服务器异常,此时数据库亦可以记录服务器异常之前,对当前子文件内所有校验过的商品属性数据。在此基础上。其他服务器在对该子文件重新进行校验时,可以选择从头开始校验,亦可以选择继续对该子文件内尚未入库的商品属性数据进行校验。For the embodiment a, the server can theoretically synchronize the attribute data verification operation and the warehousing operation at the level of a single commodity. Therefore, in the process of verifying the sub-file, the attribute data or abnormal information of each commodity in the sub-file will also be stored in the database synchronously. On this basis, if the server is abnormal, the database can also record all the verified commodity attribute data in the current sub-file before the server is abnormal. on the basis of. When the other servers re-verify the sub-file, they can choose to start the verification from the beginning, or they can choose to continue to verify the commodity attribute data in the sub-file that has not yet been put into storage.
例如假设子文件a中有1000个商品的属性数据。并假设服务器会依次校验各个商品的属性数据,且在校验到第500个商品时出现异常(此时第500商品还未完成属性数据校验),无法继续校验。此时,前499个商品的属性数据均以入库。其他服务器在对子文件a进行校验时,可以选择重新对1000个商品进行属性数据校验,亦可以选择从第500个商品开始重新进行属性数据校验。For example, suppose there are attribute data of 1000 items in subfile a. It is assumed that the server will verify the attribute data of each commodity in turn, and an exception occurs when the 500th commodity is verified (at this time, the attribute data verification of the 500th commodity has not been completed), and the verification cannot be continued. At this time, the attribute data of the first 499 products are all stored in the warehouse. When the other servers verify the sub-file a, they can choose to re-check the attribute data of 1000 commodities, or they can choose to re-check the attribute data from the 500th commodity.
针对实施例b,服务器若出现异常情况导致无法对当前子文件继续进行校验,会使得数据库无法获取到当前子文件内的属性数据。因此其他服务器需要重新对该子文件进行完整的校验。With regard to Embodiment b, if the server cannot continue to verify the current sub-file due to an abnormal situation, the database will be unable to obtain the attribute data in the current sub-file. Therefore, other servers need to re-check the subfile completely.
综上,对于服务器异常的情况,相对实施例b,实施例a理论上可以减少对子文件重复校验的概率,进而减少对子文件校验的工作量,实现对服务器异常情况的有效应对。To sum up, for the abnormal situation of the server, compared with the embodiment b, the embodiment a can theoretically reduce the probability of repeated verification of the sub-files, thereby reducing the workload of verifying the sub-files, and realizing the effective response to the abnormal situation of the server.
基于上述几点差异,技术人员可以结合实际应用需求来选取实施例a或者实施例b进行商品数据的校验和入库。此处不做过多限定。Based on the above-mentioned differences, the technical personnel can select Embodiment a or Embodiment B to perform verification and storage of commodity data in combination with actual application requirements. Not too limited here.
针对部分一:商品管理系统对商品数据的管理操作的几点补充说明。For Part 1: Some supplementary explanations on the management operation of commodity data by commodity management system.
(一)、可以对CP展示商品数据离线导入进度。(1) The offline import progress of product data can be displayed on the CP.
在图2A至图3A所示实施例的基础上,为了方便CP了解对商品数据的入库情况。在本申请实施例中会由服务器对商品数据离线导入的进度进行统计,并反馈给CP终端。详述如下:On the basis of the embodiments shown in FIG. 2A to FIG. 3A , in order to facilitate the CP to know the storage situation of commodity data. In the embodiment of the present application, the server will count the progress of offline import of commodity data, and feed it back to the CP terminal. Details are as follows:
服务器向数据库发送进度查询请求。The server sends a progress query request to the database.
数据库在接收到进度查询请求后,获取父任务下执行完成的子任务第一数量,以及父任务下包含的所有子任务第二数量。After receiving the progress query request, the database obtains the first number of subtasks executed and completed under the parent task, and the second number of all subtasks included under the parent task.
数据库根据第一数量和第二数量,生成进度数据,并发送至服务器。The database generates progress data according to the first quantity and the second quantity, and sends it to the server.
服务器将进度数据发送至CP终端。The server sends progress data to the CP terminal.
CP终端对进度数据进行展示。The CP terminal displays the progress data.
在本申请实施例中,服务器可以向数据库发送进度查询请求。数据库再接收到该请求之后,会对请求进行响应。即会获取父任务下执行完成的子任务数量(即第一数量)和子任务的总数量(即第二数量)。并会根据这两个数量来生成进度数据。再发送给服务器,由服务器发送给CP终端进行展示。In this embodiment of the present application, the server may send a progress query request to the database. After the database receives the request, it will respond to the request. That is, the number of subtasks executed and completed under the parent task (that is, the first number) and the total number of subtasks (that is, the second number) are obtained. And will generate progress data based on these two quantities. Then send it to the server, and the server sends it to the CP terminal for display.
其中,服务器可以是以一定规则主动查询,例如定时查询,或者周期查询等。也可以是对CP主动发起的查询进行响应。此时需要CP在CP终端中进行操作。由CP终端向服务器发送查询请求。再由服务器向数据库进行查询。Wherein, the server may actively query according to certain rules, such as regular query, or periodic query. It can also respond to a query initiated by the CP. In this case, the CP needs to operate in the CP terminal. A query request is sent by the CP terminal to the server. The server then queries the database.
同时,本申请实施例不对离线导入进度的表征方式进行过多限定。因此,对应的进度数据的格式和分析方法等,此处亦不做限定。可由技术人员根据实际需求设定。例如,可以以百分比的方式表征离线导入进度。此时进度数据即为父任务下执行完成的子任务数量占子任务的总数量百分比。例如假设父任务下共有20个子任务,其中, 有10个子任务执行完成。此时进度数据即为(10÷20)×100%=50%。又例如,亦可以采用“执行完成的子任务数量/子任务的总数量”的方式表征离线导入进度。此时数据库无需对执行完成的子任务数量和子任务的总数量进行处理。并可以将两个数量值作为进度数据反馈至服务器。此种情况下,CP终端可以以“执行完成的子任务数量/子任务的总数量”的方式展示进度。例如假设父任务下共有20个子任务,其中,有10个子任务执行完成。此时CP可以以“10/20”的方式展示离线导入进度。At the same time, the embodiments of the present application do not limit too many ways of representing offline import progress. Therefore, the format and analysis method of the corresponding progress data are not limited here. It can be set by technicians according to actual needs. For example, offline import progress can be characterized as a percentage. At this time, the progress data is the percentage of the number of subtasks executed and completed under the parent task to the total number of subtasks. For example, suppose there are 20 subtasks under the parent task, and 10 subtasks are executed and completed. At this time, the progress data is (10÷20)×100%=50%. For another example, the offline import progress may also be represented by the method of "the number of subtasks executed/total number of subtasks". At this time, the database does not need to process the number of completed subtasks and the total number of subtasks. And the two quantity values can be fed back to the server as progress data. In this case, the CP terminal can display the progress in the form of "number of subtasks executed/total number of subtasks". For example, suppose that there are 20 subtasks under the parent task, of which 10 subtasks are executed. At this time, the CP can display the offline import progress in a "10/20" manner.
本申请实施例实现了对商品数据离线导入的进度反馈,使得CP可以及时获知进度情况。The embodiment of the present application realizes the progress feedback of offline import of commodity data, so that the CP can know the progress in time.
(二)、图2A至图3A所示实施例,亦可用于商品数据的在线管理。(2) The embodiments shown in FIG. 2A to FIG. 3A can also be used for online management of commodity data.
在本申请实施例中,商品数据在线管理包括对商品数据的增加、删除、修改和查询。其中,增加、删除和修改,是指在已入库商品数据的基础上,加入新商品的属性数据、删除已有商品的属性数据和修改已有商品的属性数据。查询是指,在已入库商品数据的基础上,查询已有商品的属性数据。In this embodiment of the present application, the online management of commodity data includes addition, deletion, modification and query of commodity data. Among them, adding, deleting, and modifying refers to adding attribute data of new products, deleting attribute data of existing products, and modifying attribute data of existing products on the basis of commodity data already in storage. Query refers to querying the attribute data of existing products on the basis of commodity data already in storage.
在CP所需管理的商品数量较多时,可以优先采用图2A至图3A所示实施例实现对商品数据的离线导入和管理。此时CP可以在将商品数据上传商品管理系统之后等候一段时间,即可实现对商品数据的入库和管理。而在所需管理的商品数据较少时,例如仅有几件或者十几件商品需要进行商品数据入库管理时。一方面可以采用图2A至图3A所示实施例实现对商品数据的离线导入。此时图2A至图3A所示实施例会进行商品数据的处理,但由于商品数量较少,此时商品数据的拆分极有可能仅会有一个子文件产生(即不用进行拆分,直接将商品数据作为一个子文件处理)。因此此时处理的步骤会相对简单一些。另一方面,考虑到商品数量较少,商品数据结构化的操作难度较低。因此也可以采用现有技术,先进行商品数据的结构化处理,再上传至存储桶。When the number of commodities to be managed by the CP is large, the embodiments shown in FIG. 2A to FIG. 3A may be preferentially used to implement offline import and management of commodity data. At this time, the CP can wait for a period of time after uploading the commodity data to the commodity management system, and then the storage and management of the commodity data can be realized. However, when the commodity data to be managed is relatively small, for example, when only a few or a dozen commodities need to be managed by commodity data storage. On the one hand, the embodiments shown in FIG. 2A to FIG. 3A can be used to implement offline import of commodity data. At this time, the embodiments shown in FIG. 2A to FIG. 3A will process the commodity data, but due to the small number of commodities, it is very likely that only one sub-file will be generated when the commodity data is split at this time (that is, no splitting is required, and the Product data is handled as a subfile). Therefore, the processing steps at this time will be relatively simple. On the other hand, considering the small number of commodities, the operational difficulty of commodity data structuring is relatively low. Therefore, the existing technology can also be used to first perform structured processing of commodity data, and then upload it to the storage bucket.
综上,CP实际采用的商品数据入库管理方法,可由CP根据需求自行选定,此处不做过多限定。但应当理解地,图2A至图3A所示实施例可以同时适用于商品数量较多或较少的全场景需求。To sum up, the commodity data storage management method actually adopted by the CP can be selected by the CP according to the needs, and there is no excessive restriction here. However, it should be understood that the embodiments shown in FIG. 2A to FIG. 3A can be applied to full-scenario requirements with a large or small number of commodities at the same time.
作为本申请一个可选实施例,服务器向CP终端提供可调用的API。并支持Java、PHP、C++、Python等多种语言的软件开发工具包(Software Development Kit,SDK)。As an optional embodiment of the present application, the server provides a callable API to the CP terminal. And supports the software development kit (Software Development Kit, SDK) of Java, PHP, C++, Python and other languages.
参考图3B,商品管理系统为CP提供商品管理服务(即图2A至图3A所示实施例实现的商品管理),同时为用户提供商品的在线搜索服务。CP通过CP终端调用服务器API,触发商品管理系统的商品管理服务,以完成对商品数据的在线管理。其中,当需要商品数据增加时,可以采用图2A至图3A所示实施例中的任一实施例实现。其中,当商品数据较少时,可以不进行子文件拆分。由服务器API直接完成对商品数据内各个属性数据的校验。并实现对属性数据入库和异常信息记录等操作。Referring to FIG. 3B , the commodity management system provides the CP with commodity management services (ie commodity management implemented by the embodiments shown in FIGS. 2A to 3A ), and provides users with online commodity search services. The CP invokes the server API through the CP terminal to trigger the commodity management service of the commodity management system to complete the online management of commodity data. Wherein, when the commodity data needs to be increased, any one of the embodiments shown in FIG. 2A to FIG. 3A can be used to implement. Wherein, when there is less commodity data, sub-file splitting may not be performed. The verification of each attribute data in the commodity data is directly completed by the server API. And realize operations such as attribute data storage and exception information recording.
而对于删除、修改和查询的操作,则CP可以通过CP终端调用服务器API,告知服务器所需操作的商品,以及具体的操作内容。服务器在获知操作的商品和操作的内容之后。再根操作内容,对数据库中的商品进行操作。例如可以对对数据库中商品A的价格进行查询获知修改,或者删除数据库中商品A的所有属性数据。For the operations of deletion, modification and query, the CP can call the server API through the CP terminal to inform the server of the commodity to be operated and the specific operation content. After the server is informed of the commodity of the operation and the content of the operation. Then, based on the operation content, operate the commodities in the database. For example, the price of commodity A in the database can be queried to learn the modification, or all attribute data of commodity A in the database can be deleted.
作为本申请的一个可选实施例,图3C是一种基于图3B所示实施例的商品管理系统服务场景交互图。本申请实施例的一方面,CP可以根据需求来操作CP终端,利用 CP终端调用服务器API,触发商品管理系统的商品管理服务。商品管理系统在商品管理服务触发后,会基于CP终端的实际操作,来进行商品数据的关联,并将管理结果告知CP终端。例如对商品数据的删除、修改和查询的操作结果。而另一方面,用户可以根据需求操作用户终端,通过用户终端向商品管理系统输入商品文本或商品图片,触发商品管理系统的在线搜索服务。商品管理系统在接收到用户终端发送的商品文本或商品图片后,会基于接收到的商品文本或商品图片进行商品在线搜索,并会将在线搜索结果返回至用户终端,以供用户查看。As an optional embodiment of the present application, FIG. 3C is a service scenario interaction diagram of a commodity management system based on the embodiment shown in FIG. 3B . In one aspect of the embodiments of the present application, the CP can operate the CP terminal according to requirements, and use the CP terminal to call the server API to trigger the commodity management service of the commodity management system. After the commodity management service is triggered, the commodity management system will associate commodity data based on the actual operation of the CP terminal, and inform the CP terminal of the management result. For example, the operation results of deleting, modifying and querying commodity data. On the other hand, the user can operate the user terminal as required, and input the product text or image into the product management system through the user terminal to trigger the online search service of the product management system. After receiving the commodity text or commodity picture sent by the user terminal, the commodity management system will perform an online commodity search based on the received commodity text or commodity picture, and return the online search result to the user terminal for the user to view.
(三)、对图2A至图3A所示实施例中的服务器说明如下:(3), the server in the embodiment shown in Figure 2A to Figure 3A is described as follows:
在本申请实施例中,存在多处需要服务器作为执行主体的操作,具体操作至少包括以下4组:In the embodiment of the present application, there are many operations that require the server as the execution body, and the specific operations include at least the following four groups:
1、对商品数据进行拆分,将子文件上传NSP,以及在数据库中创建父任务和子任务。如S102-S103。1. Split product data, upload child files to NSP, and create parent tasks and child tasks in the database. Such as S102-S103.
2、对子任务进行查询、子文件进行下载、校验以及属性数据入库。如S104、S106、S1061-S1063、S107、S1071-S10711、S108-S109以及S201-S202。2. Query subtasks, download subfiles, verify and store attribute data. Such as S104, S106, S1061-S1063, S107, S1071-S10711, S108-S109 and S201-S202.
3、将入库结果发送至CP终端。包括S111。3. Send the storage result to the CP terminal. Including S111.
4、对商品数据进行在线管理。包括补充说明点(二)。4. Online management of commodity data. Including supplementary explanation point (2).
实际应用中,有两种可选的方式来实现图2A至图3A所示实施例:In practical applications, there are two optional ways to implement the embodiments shown in FIG. 2A to FIG. 3A :
a、仅使用一个服务器来实现上述4组操作。a. Only one server is used to implement the above four groups of operations.
b、设置多个服务器共同完成上述4组操作。b. Set up multiple servers to complete the above four groups of operations together.
针对方式a,此时图2A至图3A所示实施例中所有服务器均是同一服务器。适用于CP和商品数据较少的场景。此时服务器的成本较低。For the mode a, all the servers in the embodiments shown in FIG. 2A to FIG. 3A are the same server at this time. It is suitable for scenarios with less CP and commodity data. At this point the cost of the server is lower.
针对方式b,可适用于CP和商品数据较多的场景。此时利用多个服务器共同处理,可以实现对商品数据的多并发处理,提高商品数据管理的效率。相应的,上述4组操作中,每组操作的执行主体均可以是多个服务器中的任一服务器。其中,本申请实施例不对每组操作具体执行主体的确定方式进行过多限定。可由技术人员根据实际需求设定。For mode b, it can be applied to scenarios with more CP and commodity data. At this time, by using multiple servers to process together, multiple concurrent processing of commodity data can be realized, and the efficiency of commodity data management can be improved. Correspondingly, in the above four groups of operations, the execution subject of each group of operations may be any one of the multiple servers. Wherein, the embodiment of the present application does not limit the manner of determining the specific execution subject of each group of operations too much. It can be set by technicians according to actual needs.
例如,对于操作1:对商品数据进行拆分,将子文件上传NSP,以及在数据库中创建父任务和子任务。在一些可选实施例中,可以从设置的多个服务器中选取出一个服务器负责执行操作1。此时无论哪个CP上传商品数据,均会由该服务器进行商品数据拆分、子文件上传和任务创建处理。而在另一些可选实施例中,也可以设置为每次有CP上传商品数据时。随机从多个服务器中选取出一个服务器负责执行操作1。例如,可以引入分布式锁机制,此时各个服务器会同步向缓存组件申请针对商品数据的分布式锁。每次仅由抢到分布式锁的服务器来执行操作1。For example, for operation 1: split item data, upload child files to NSP, and create parent and child tasks in database. In some optional embodiments, one server may be selected from the set multiple servers to be responsible for performing operation 1 . At this time, no matter which CP uploads product data, the server will perform product data splitting, sub-file uploading, and task creation processing. In some other optional embodiments, it can also be set as every time a CP uploads commodity data. A server is randomly selected from multiple servers to perform operation 1. For example, a distributed lock mechanism can be introduced. At this time, each server will synchronously apply to the cache component for distributed locks for commodity data. Operation 1 is performed only by the server that grabbed the distributed lock at a time.
对于操作2:对子任务进行查询、子文件进行下载、校验以及属性数据入库。与操作1类似的,在一些可选实施例中,可以从设置的多个服务器中选取出一个服务器负责执行操作2。此时无论哪个CP上传商品数据,均会由该服务器进行子任务进行查询、子文件进行下载、校验以及属性数据入库。而在另一些可选实施例中,也可以设置为在有子任务等待执行时,多个服务器同步查询子任务并进行处理。此时可以实现对子任务的高并发处理。例如,可以引入分布式锁机制,此时各个会同步向缓存组件 申请针对子任务的分布式锁。对于单个子任务而言,可以由抢到对应分布式锁的服务器来执行。For operation 2: query subtasks, download subfiles, verify, and store attribute data. Similar to operation 1, in some optional embodiments, a server may be selected from the plurality of servers set to be responsible for performing operation 2. At this time, no matter which CP uploads product data, the server will perform subtasks for query, subfiles for download, verification, and attribute data storage. In other optional embodiments, it may also be configured that when there are subtasks waiting to be executed, multiple servers synchronously query and process the subtasks. At this point, high concurrent processing of subtasks can be achieved. For example, a distributed lock mechanism can be introduced. At this time, each will apply to the cache component for distributed locks for subtasks synchronously. For a single subtask, it can be executed by the server that grabs the corresponding distributed lock.
针对操作3:将入库结果发送至CP终端。与操作1类似的,在一些可选实施例中,可以从设置的多个服务器中选取出一个服务器负责执行操作3。此时数据库会将入库结果发送至该选取出的服务器,以发送至CP终端。而在另一些可选实施例中,也可以设置为数据库每次得到入库结果时,随机从多个服务器中选取出一个服务器,并将入库结果发送至该服务器。For operation 3: send the storage result to the CP terminal. Similar to operation 1, in some optional embodiments, a server may be selected from the plurality of servers set to be responsible for performing operation 3. At this time, the database will send the storage result to the selected server for sending to the CP terminal. In some other optional embodiments, it may also be configured that each time the database obtains a storage result, a server is randomly selected from a plurality of servers, and the storage result is sent to the server.
针对操作4:对商品数据进行在线管理。与操作1类似的,在一些可选实施例中,可以从设置的多个服务器中选取出一个服务器负责执行操作4。此时CP终端会告知该服务器每次所需操作的商品,以及具体的操作内容。由该服务器实现对商品数据的在线管理。而在另一些可选实施例中,也可以设置为每次将CP终端的数据随机发送给多个服务器中的一个服务器,并由该服务器实现对商品数据的在线管理。For operation 4: online management of commodity data. Similar to operation 1, in some optional embodiments, one server may be selected from the set multiple servers to be responsible for performing operation 4. At this time, the CP terminal will inform the server of the commodities to be operated each time, as well as the specific operation contents. The online management of commodity data is realized by the server. In some other optional embodiments, it may also be set to randomly send the data of the CP terminal to one server among the multiple servers each time, and the server implements the online management of the commodity data.
部分二:用户进行商品搜索。Part 2: The user conducts a product search.
在部分一实现对商品数据管理的基础上,为了方便用户查找商品,实现对商品的曝光。本申请实施例会为用户提供商品搜索功能。用户可以根据需求上传商品相关的文本或者图片至商品管理系统。由商品管理系统基于用户上传的文本或图片进行商品搜索和搜索结果反馈。具体而言,商品搜索可以分为搜索前和搜索中两个阶段,详述如下:On the basis of realizing the management of commodity data in part one, in order to facilitate users to find commodities, the exposure of commodities is realized. This embodiment of the present application will provide a user with a commodity search function. Users can upload product-related text or pictures to the product management system according to their needs. Commodity search and search result feedback are performed by the commodity management system based on the text or pictures uploaded by the user. Specifically, product search can be divided into two stages: pre-search and in-search, which are detailed as follows:
(一)、搜索前。(1) Before searching.
在搜索前,首先需要对商品图片进行图像特征分析,并得到用于图片搜索的图像特征数据。在本申请实施例中,图像特征分析的操作可以发生在以下两个阶段:Before searching, it is first necessary to perform image feature analysis on the product image, and obtain image feature data for image search. In this embodiment of the present application, the operation of image feature analysis may occur in the following two stages:
阶段1、在图2A至图3A所示实施例对子文件校验过程中,同步对商品图片进行图像特征分析。Stage 1: During the verification process of the sub-files in the embodiment shown in FIG. 2A to FIG. 3A , the image feature analysis is performed on the commodity pictures synchronously.
阶段2、在图2A至图3A所示实施例完成对商品数据入库之后,对存入NSP的商品图片进行图像特征分析。Stage 2: After the embodiment shown in FIG. 2A to FIG. 3A completes the storage of the commodity data, perform image feature analysis on the commodity pictures stored in the NSP.
实际应用中,技术人员可以根据需求设定为上述两个阶段中的任一阶段进行图像特征分析。In practical applications, technicians can set any one of the above two stages to perform image feature analysis according to requirements.
例如,若设定为阶段1进行图像特征分析。图2A至图3A所示实施例中,服务器在完成对单个子文件进行校验时,对校验通过的商品,会对相应的商品图片进行图像特征分析。并会将得到的图片特征数据存储至特征库之中。而若设置为阶段2进行图像特征分析。则可以由服务器在对子文件校验完成之后,对校验通过的商品的商品图片进行图像特征分析。并会得到的图片特征数据存储至特征库之中。For example, if it is set to stage 1 to perform image feature analysis. In the embodiments shown in FIG. 2A to FIG. 3A , when the server completes the verification of a single sub-file, it will perform image feature analysis on the corresponding commodity pictures for the commodities that have passed the verification. And the obtained image feature data will be stored in the feature library. However, if it is set to stage 2, image feature analysis is performed. Then, after the verification of the sub-file is completed, the server may perform image feature analysis on the commodity pictures of the commodities that have passed the verification. And the obtained image feature data will be stored in the feature library.
图像特征分析的操作中,对各张商品图片的操作均相同,其中对单张商品图片的操作,包括:In the operation of image feature analysis, the operations on each product image are the same, and the operations on a single product image include:
S301,服务器获取商品图片,对商品图片进行图像特征分析,并将得到的图像特征数据存储至特征库。S301, the server obtains the image of the product, performs image feature analysis on the image of the product, and stores the obtained image feature data in a feature library.
首先应当说明地,本申请实施例中的服务器,可以是对子文件进行校验的服务器。也可以是其他服务器。具体可由技术人员根据需求设定,此处不予限定。相应的,根据服务器的情况以及图像特征分析发生的阶段不同,商品图片“获取”的方式也可能存 在差异。例如,当是执行子文件校验的服务器实现S301时。获取可以是指服务器读取已下载的商品图片(参考S106,此时服务器已经通过图片下载地址下载了商品图片)。而对于本地不包含商品图片的情况,则需从NSP中下载商品图片。此时获取则是指从NSP下载商品图片。First of all, it should be noted that the server in this embodiment of the present application may be a server that verifies sub-files. Can also be other servers. The specifics can be set by technical personnel according to requirements, which are not limited here. Correspondingly, depending on the situation of the server and the stage in which the image feature analysis occurs, there may be differences in the way of “acquiring” the product images. For example, when S301 is implemented by a server that performs sub-file verification. Obtaining may refer to the server reading the downloaded product image (refer to S106, at this time, the server has downloaded the product image through the image download address). For the case where the product image is not included locally, the product image needs to be downloaded from the NSP. In this case, the acquisition refers to downloading the product image from the NSP.
另外,本申请实施例不对具体的图像特征分析方法做过多的限定,可由技术人员根据实际需求确定。例如可以预先训练一些基于神经网络或深度学习的图像特征提取模型,以进行图像特征分析。而图像特征数据的数据类型和包含的内容,则需根据具体的图像特征方法确定。例如,可以是图像特征点和描述特征点信息的特征向量,或者是图像特征向量,如1024维浮点型图像特征向量。亦可以是其他特征数据。In addition, the embodiments of the present application do not limit the specific image feature analysis method too much, which can be determined by technical personnel according to actual needs. For example, some image feature extraction models based on neural networks or deep learning can be pre-trained for image feature analysis. The data type and content of the image feature data need to be determined according to the specific image feature method. For example, it can be an image feature point and a feature vector describing the feature point information, or an image feature vector, such as a 1024-dimensional floating point image feature vector. It can also be other characteristic data.
作为本申请的一个可选实施例,考虑到每种类目下的商品特征是具有一定共性的。例如同类目下商品的形状往往较为相似。因此为了提升图像特征分析的效果,使得得到图像特征数据可以更好地表征商品。在本申请实施例中,可以针对不同类目的商品预先设计不同的图像特征提取模型。在进行图像特征分析时,则根据商品实际所属类目(此时商品数据中需包含商品的类目)来选择对应的模型,并进行分析。As an optional embodiment of the present application, it is considered that the characteristics of commodities under each category have certain commonalities. For example, the shapes of commodities in the same category are often similar. Therefore, in order to improve the effect of image feature analysis, the obtained image feature data can better characterize the product. In this embodiment of the present application, different image feature extraction models may be pre-designed for different categories of commodities. When performing image feature analysis, a corresponding model is selected and analyzed according to the actual category of the product (at this time, the product data needs to include the category of the product).
以一实例进行举例说明。假设将商品类目分为:服装、数码家电、鞋子、箱包、家居、玩具、美妆、配饰、食品和其它类共10种类目。此时可以针对10种类目分别设计一个图像特征提取模型,得到对应的10个模型。而在对商品进行图像特征分析时,则先根据商品数据中的类目,来确定当前商品对应的图像特征提取模型。再利用该图像特征提取模型来对当前商品进行图像特征分析,得到图像特征数据。Take an example to illustrate. Suppose the product categories are divided into 10 categories: clothing, digital home appliances, shoes, luggage, home furnishing, toys, beauty, accessories, food and others. At this time, an image feature extraction model can be designed for each of the 10 categories, and 10 corresponding models can be obtained. When performing image feature analysis on a product, the image feature extraction model corresponding to the current product is first determined according to the category in the product data. Then, the image feature extraction model is used to analyze the image feature of the current product to obtain image feature data.
作为本申请的一个可选实施例,为了提高图片搜索的准确性,本申请实施例会同时引入商品信息作为辅助进行商品图片的图像特征分析。详述如下:As an optional embodiment of the present application, in order to improve the accuracy of the image search, the embodiment of the present application will simultaneously introduce commodity information as an auxiliary for image feature analysis of commodity pictures. Details are as follows:
本申请实施例会预先训练基于神经网络的图像特征分析模型,并基于该图像特征分析模型来对商品数据进行分析,得到对应的图像特征数据。其中,针对不同类目的商品,可以分别设置对应的图像特征分析模型。In this embodiment of the present application, an image feature analysis model based on a neural network is pre-trained, and commodity data is analyzed based on the image feature analysis model to obtain corresponding image feature data. Among them, for different categories of commodities, corresponding image feature analysis models can be set respectively.
图像特征分析模型的训练流程包括:The training process of the image feature analysis model includes:
预先设置一个初始模型。Preset an initial model.
获取多个样本商品的商品图片和对应的商品信息,将这些商品图片和商品信息作为样本数据,并为每张商品图片和每个商品信息添加对应样本商品的分类标签。Obtain product pictures and corresponding product information of multiple sample products, use these product pictures and product information as sample data, and add a classification label corresponding to the sample product to each product picture and each product information.
利用初始模型对作为样本数据的商品信息进行特征提取,并根据提取出的文本特征和对应的分类标签,计算对商品信息的第一损失函数。其中,文本特征可以是词向量,亦可以是其他文本特征。提取方法此处不做过多限定,例如可以对商品信息进行分词,并利用文本嵌入等方法得到词向量。可以先利用全连接层等对文本特征进行处理分类,再基于分类结果和分类标签来计算损失函数。Feature extraction is performed on commodity information as sample data by using the initial model, and a first loss function for commodity information is calculated according to the extracted text features and corresponding classification labels. The text feature may be a word vector or other text features. The extraction method is not limited here, for example, the product information can be segmented, and the word vector can be obtained by using methods such as text embedding. The text features can be processed and classified by using a fully connected layer, etc., and then the loss function can be calculated based on the classification results and classification labels.
利用初始模型提取作为样本数据的商品图片的图像特征,并根据图像特征和对应的分类标签,计算对商品图片的第二损失函数。可以先利用全连接层等对图像特征进行处理分类,再基于分类结果和分类标签来计算损失函数。The initial model is used to extract the image features of the product images as sample data, and the second loss function for the product images is calculated according to the image features and the corresponding classification labels. The image features can be processed and classified by using the fully connected layer, and then the loss function can be calculated based on the classification results and classification labels.
基于第一损失函数和第二损失函数计算第三损失函数,并根据计算出的第三损失函数值迭代更新初始模型,直至满足预设收敛条件,得到训练完成的模型。A third loss function is calculated based on the first loss function and the second loss function, and the initial model is iteratively updated according to the calculated value of the third loss function until a preset convergence condition is satisfied, and a trained model is obtained.
将训练完成的模型中,用于商品图片特征提取的各个网络提取出来,并得到由这 些提取出的网络构成的图像特征分析模型。In the trained model, each network used for feature extraction of commodity images is extracted, and an image feature analysis model composed of these extracted networks is obtained.
其中,第一损失函数、第二损失函数和第三损失函数的具体损失函数类型此处不予限定,可由技术人员根据需求自行设定。例如第一损失函数可以是图像三元损失函数(Image Triplet Loss)或图像类损失函数(Image Class Loss),亦可以是其他损失函数。第二损失函数可以采用文本类损失函数(Text Class Loss),亦可以是其他损失函数。第三损失函数可以是Kullback-Leibler损失函数。亦可以是其他损失函数。The specific loss function types of the first loss function, the second loss function, and the third loss function are not limited here, and can be set by technical personnel according to requirements. For example, the first loss function may be an image triplet loss function (Image Triplet Loss) or an image class loss function (Image Class Loss), or may be other loss functions. The second loss function may use a text class loss function (Text Class Loss), or may be other loss functions. The third loss function may be a Kullback-Leibler loss function. Other loss functions are also possible.
在本申请实施例中,采用分类模型的训练方式,分别对样本商品的商品图片和商品信息进行处理。在得到两个维度的损失函数之后,再进行多模融合的模型训练。即将两个维度的损失函数值通过一个新的损失函数进行融合,并基于融合得到的损失函数值来进行模型的迭代更新。最后将训练完成的模型中,用于商品图片特征提取的各个网络提取出来(即舍弃商品信息特征提取部分的网络),组成一个新的用于图像特征分析的模型。实践证明,基于这一方法训练出图像特征分析模型,可以实现对商品图片特征更准确可靠的提取,得到的图像特征数据对商品图片具有较好的表征作用。基于这个图像特征分析模型提取出的图像特征数据,在进行商品图片匹配时,准确率较高。In the embodiment of the present application, the training method of the classification model is used to separately process the commodity pictures and commodity information of the sample commodities. After the loss function of two dimensions is obtained, the model training of multi-modal fusion is performed. That is, the loss function values of the two dimensions are fused through a new loss function, and the model is iteratively updated based on the loss function value obtained by fusion. Finally, from the trained model, each network used for feature extraction of product images is extracted (ie, the network that discards the feature extraction part of product information) to form a new model for image feature analysis. Practice has proved that the image feature analysis model trained based on this method can achieve more accurate and reliable extraction of product image features, and the obtained image feature data has a better characterization effect on product images. The image feature data extracted based on this image feature analysis model has a high accuracy rate when performing product image matching.
在得到图像特征分析模型的基础上,本申请实施例会利用该图像特征分析模型对各个商品图片进行分析,从而得到对应的图像特征数据。On the basis of obtaining the image feature analysis model, the embodiment of the present application will use the image feature analysis model to analyze each commodity picture, so as to obtain corresponding image feature data.
作为本申请的一个可选实施例,考虑到实际应用中CP在拍摄商品图片时,很大概率会拍摄到一些商品以外的物体。此时商品图片中可能会包含多个物体。因此若直接对商品图片进行特征分析,得到的是同时包含其他物体的图像特征数据,不利于后续的图像匹配。因此本申请实施例会在图像特征分析之前,先对商品图片进行商品检测。参考图4A,此时S301可以被替换为:As an optional embodiment of the present application, considering that in practical applications, when the CP takes pictures of commodities, there is a high probability that some objects other than commodities will be photographed. At this point, the product image may contain multiple objects. Therefore, if the feature analysis is performed directly on the product image, the obtained image feature data also contains other objects, which is not conducive to subsequent image matching. Therefore, in this embodiment of the present application, commodity detection is performed on commodity pictures before image feature analysis. Referring to Figure 4A, at this time S301 can be replaced with:
S3011,服务器获取商品图片,对商品图片进行商品检测,并根据检测结果截取出商品图像。S3011 , the server acquires the image of the product, performs product detection on the image of the product, and cuts out the image of the product according to the detection result.
S3012,服务器对商品图像进行图像特征分析,并将得到的图像特征数据存储至特征库。S3012, the server performs image feature analysis on the commodity image, and stores the obtained image feature data in a feature library.
其中,本申请实施例不对商品检测(实质即为物体识别)的方法进行过多限定,可由技术人员根据实际需求设定。例如在一些实施例中,可以先利用一些物体定位方法来定位出商品图片中包含的所有物体。例如可以使用opencv提供的一些物体定位算法,亦可以使用SSD算法或一些基于深度学习的物体定位模型,来实现物体定位。考虑到拍摄商品时,一般会将商品置于拍摄设备的镜头前。因此在定位出各个物体之后,可以将其中占据像素点面积最大的物体识别为商品,并截取商品目标框内的图像即可。Among them, the embodiment of the present application does not limit the method of commodity detection (in essence, object recognition), which can be set by technical personnel according to actual needs. For example, in some embodiments, some object positioning methods may be used to locate all objects included in the product image. For example, you can use some object positioning algorithms provided by opencv, or you can use SSD algorithms or some deep learning-based object positioning models to achieve object positioning. When taking into account the photographing of merchandise, the merchandise is generally placed in front of the camera of the photographing equipment. Therefore, after locating each object, the object occupying the largest pixel area can be identified as a commodity, and the image in the commodity target frame can be intercepted.
作为本申请的一个可选实施例,考虑到实际应用中,截取出的商品图像的大小无法预先确定。此时若直接对商品图像进行图像特征分析,可能会导致图像特征数据情况的不可控。不利于后续的图像特征匹配等操作。因此在本申请实施例中,可以在S3012之前先对商品图像进行长宽像素补齐,使得商品图像长宽相同。再将商品图像缩放至预设尺寸,例如299×299大小。最后将缩放得到的商品图像作为S3012的图像特征分析对象。此时得到的图像特征数据,其数据量较为可控。As an optional embodiment of the present application, in consideration of practical applications, the size of the captured commodity image cannot be predetermined. At this time, if the image feature analysis is performed directly on the product image, the situation of the image feature data may be uncontrollable. It is not conducive to subsequent operations such as image feature matching. Therefore, in this embodiment of the present application, the length and width pixels of the product image may be filled before S3012, so that the length and width of the product image are the same. Then scale the product image to a preset size, such as 299×299. Finally, the commodity image obtained by scaling is used as the image feature analysis object of S3012. The image feature data obtained at this time has a relatively controllable amount of data.
作为本申请的一个可选实施例,实际应用中,S3011中商品检测的操作亦可以由 CP或者技术人员手动完成。此时CP或技术人员可以手动将商品图片中的商品框选出来。并由服务器进行商品图像的截取。As an optional embodiment of the present application, in practical applications, the operation of commodity detection in S3011 can also be performed manually by a CP or a technician. At this time, the CP or the technical staff can manually select the product box in the product picture. And the server performs the interception of the commodity image.
作为本申请的另一个可选实施例,考虑到实际应用中,商品的数量往往较多。特别是有较多CP使用商品管理系统时,商品的数量更是会成倍增长。相应的,对商品图片特征分析得到的图像特征数据的数据量也会急剧增加。使得特征库的数据存储压力较大。为了减小特征库的数据存储压力,以减小特征库成本。本申请实施例在得到图像特征数据之后,还会对图像特征数据进行压缩。再将压缩后的图像特征数据存储至特征库。其中,本申请实施例不对图像特征数据的压缩方法做过多限定,可由技术人员根据需求设定。例如可以是降低图像特征数据的精度,以减小数据体积。As another optional embodiment of the present application, in consideration of practical applications, the number of commodities is often large. Especially when more CPs use the commodity management system, the number of commodities will increase exponentially. Correspondingly, the amount of image feature data obtained by analyzing the product image features will also increase sharply. This makes the data storage pressure of the signature database relatively large. In order to reduce the data storage pressure of the feature library, the cost of the feature library can be reduced. After the image feature data is obtained in this embodiment of the present application, the image feature data is further compressed. Then, the compressed image feature data is stored in the feature library. Wherein, the embodiment of the present application does not limit the compression method of the image feature data too much, which can be set by technical personnel according to requirements. For example, the precision of the image feature data can be reduced to reduce the data volume.
作为本申请中图像特征数据压缩的一种可选实施例,压缩方法可以是以下方法中的任意一种:As an optional embodiment of image feature data compression in this application, the compression method can be any one of the following methods:
1、主成分分析(Principal Component Analysis,PCA)降维法,将图像特征数据的维度由1024维降至512维、256维或者128维的浮点数据。1. Principal Component Analysis (PCA) dimensionality reduction method, which reduces the dimension of image feature data from 1024 to 512, 256 or 128-dimensional floating point data.
2、将图像特征数据中的数值类型,由浮点型转化为无符号整型(Uint8),维度可以保持不变。2. Convert the numerical type in the image feature data from a floating point type to an unsigned integer type (Uint8), and the dimension can remain unchanged.
3、采用深度哈希(Hash)训练方法,将浮点型图像特征数据转化为二值图像特征数据。3. Using the deep hash (Hash) training method, the floating-point image feature data is converted into binary image feature data.
作为本申请的一个可选实施例,参考图4B的离线流程部分,是一种搜索前对商品图片进行图像特征分析的方法流程示意图。说明如下:As an optional embodiment of the present application, referring to the offline process part of FIG. 4B , it is a schematic flowchart of a method for performing image feature analysis on commodity pictures before searching. described as follows:
服务器从商品数据中获取商品图片,并添加一个用于进行商品图片搜索的API。The server obtains product images from product data and adds an API for product image search.
服务器对商品图片进行商品检测,并根据检测结果截取出商品图像。The server performs commodity detection on the commodity image, and intercepts the commodity image according to the detection result.
服务器对商品图像进行图像特征分析,得到图像特征数据。The server performs image feature analysis on the commodity image to obtain image feature data.
服务器对得到的图像特征数据进行特征压缩,并将特征压缩后的图像特征数据存储至特征库。The server performs feature compression on the obtained image feature data, and stores the feature compressed image feature data in a feature library.
具体的操作细节,可参考S103、S3011-S3012以及上述图像特征数据压缩的实施例相关说明,此处不予赘述。For specific operation details, reference may be made to S103, S3011-S3012 and related descriptions of the above-mentioned embodiments of image feature data compression, which will not be repeated here.
(二)、搜索中。(2) Searching.
在完成对商品图片的图像特征数据存储的基础上,本申请实施例中的商品管理系统会为用户提供商品搜索功能。其中商品搜索包括文本搜索和图片搜索。相应的,为了实现文本搜索和图片搜索。在本申请实施例中,要求CP上传的商品数据中,需包含商品图片或者商品图片的下载地址。On the basis of completing the storage of the image feature data of the commodity pictures, the commodity management system in the embodiment of the present application will provide the user with a commodity search function. The product search includes text search and image search. Correspondingly, in order to realize text search and image search. In the embodiment of the present application, the commodity data required to be uploaded by the CP needs to include the commodity picture or the download address of the commodity picture.
在商品搜索过程中,用户可以通过用户终端向商品管理系统发起商品搜索请求,上传待搜索的商品图片或者商品文本(即商品相关的描述文本)。商品管理系统在接收到待搜索的商品图片或者商品文本之后,会以此对已存储的商品数据进行搜索,确定出一个或多个匹配的商品。再将匹配成功商品的属性数据和商品图片返回给用户终端。由用户终端进行显示。During the commodity search process, the user can initiate a commodity search request to the commodity management system through the user terminal, and upload the commodity image or commodity text (ie, commodity-related description text) to be searched. After receiving the commodity image or commodity text to be searched, the commodity management system will search the stored commodity data and determine one or more matching commodities. Then, the attribute data and the product picture of the successfully matched product are returned to the user terminal. displayed by the user terminal.
作为本申请的一个可选实施例,商品搜索的逻辑架构示意图可以参考图4C或图4D。其中,商品管理系统负责对商品信息的实时管理,包括商品的增加、删除、修改和查询,以及商品数据的离线导入。具体包括对商品图片和商品文本的搜索,以及对 商品数据的入库(即生成商品信息和存储商品信息)。As an optional embodiment of the present application, for a schematic diagram of a logical architecture of commodity search, reference may be made to FIG. 4C or FIG. 4D . Among them, the commodity management system is responsible for the real-time management of commodity information, including the addition, deletion, modification and query of commodities, as well as offline import of commodity data. Specifically, it includes searching for commodity pictures and commodity texts, and storing commodity data (that is, generating commodity information and storing commodity information).
图4C中,用户终端可以直接将待搜索的商品图片或者商品文本上传至商品管理系统。由商品管理系统进行商品搜索和商品列表结果返回。图4C的右半部分,是指商品管理系统可为各个电商伙伴提供商品数据管理支持。即电商伙伴可以作为CP将商品数据上传至商品管理系统。由商品管理系统利用图2A至图3A所示的各个实施例实现对商品数据的入库管理。In FIG. 4C , the user terminal can directly upload the commodity image or commodity text to be searched to the commodity management system. Commodity search and commodity list results are returned by the commodity management system. The right half of FIG. 4C means that the commodity management system can provide commodity data management support for each e-commerce partner. That is, e-commerce partners can upload product data to the product management system as a CP. Inventory management of commodity data is implemented by the commodity management system using the various embodiments shown in FIGS. 2A to 3A .
在图4C的基础上,图4D中多设置了一个商品分发服务层。该商品分发服务层主要对接媒体入口。用户终端通过媒体入口将待搜索的商品图片或者商品文本上传至商品分发服务。由商品分发服务对各个用户终端上传的商品图片或者商品文本统一进行管理,并发送给商品管理系统请求进行商品匹配,以实现商品搜索。还负责将商品管理系统生成的商品列表返回至用户终端。考虑到实际应用中在用户数较多时,商品搜索的工作量可能会比较大。因此通过增加一个对接媒体入口并统一管理商品搜索请求,可使得对用户商品搜索的管理和响应更为高效可靠。实际应用中,商品分发服务可以交由一个专用的服务器实现。亦可以从商品管理系统设置的多个服务器中随机选取出一个服务器实现。具体可由技术人员根据实际需求设定,此处不予限定。On the basis of FIG. 4C , one more commodity distribution service layer is set in FIG. 4D . The commodity distribution service layer is mainly connected to the media portal. The user terminal uploads the commodity image or commodity text to be searched to the commodity distribution service through the media portal. The commodity distribution service manages the commodity pictures or commodity texts uploaded by each user terminal in a unified manner, and sends them to the commodity management system to request commodity matching, so as to realize commodity search. It is also responsible for returning the commodity list generated by the commodity management system to the user terminal. Considering that in practical applications, when the number of users is large, the workload of commodity search may be relatively large. Therefore, by adding a docking media portal and managing commodity search requests in a unified manner, the management and response to the user's commodity search can be made more efficient and reliable. In practical applications, the commodity distribution service can be implemented by a dedicated server. It can also be implemented by randomly selecting a server from multiple servers set in the commodity management system. The details can be set by technical personnel according to actual needs, which is not limited here.
对于文本搜索和图片搜索的详述如下:The details of text search and image search are as follows:
a、文本搜索。参考图5A,文本搜索的流程包括:a. Text search. Referring to Figure 5A, the text search process includes:
S401,用户终端将用户输入的商品文本发送至服务器。S401, the user terminal sends the commodity text input by the user to the server.
作为本申请实施例执行主体之一的服务器,是商品管理系统中负责进行商品搜索的服务器。具体而言,该服务器可以是技术人员预先在商品管理系统中选定的一个服务器。也可以是从商品管理系统内包含的一个或多个服务器中,按照一定规则自动选取出的一个服务器。例如可以随机选取。此处不做过多限定。The server, which is one of the execution bodies of the embodiment of the present application, is a server in the commodity management system responsible for commodity search. Specifically, the server may be a server pre-selected by a technician in the commodity management system. It may also be a server automatically selected according to certain rules from one or more servers included in the commodity management system. For example, it can be randomly selected. Not too limited here.
在本申请实施例中,会用户终端提供商品搜索功能。用户在需要进行商品搜索时,可以启用该商品搜索功能,并输入待搜索的商品文本或者上传对应的商品图片。In the embodiment of the present application, the user terminal is provided with a commodity search function. When a user needs to search for a product, he or she can enable the product search function, and input the text of the product to be searched or upload a corresponding product image.
其中,商品搜索功能可选的提供方式至少包括以下几种:Among them, the optional provision methods of the product search function include at least the following:
1、集成在用户终端本地的搜索功能之中,例如常见的手机负一屏搜索。1. It is integrated into the local search function of the user terminal, such as the common mobile phone negative one-screen search.
2、将商品搜索功能设置于App或网页之中,用户在使用App或者网页时,可以启用其中包含的商品搜索功能。2. Set the product search function in the App or webpage. When using the App or webpage, the user can enable the product search function contained in it.
以一实例进行举例说明,可以参考图5B,此时用户终端为手机。An example is used for illustration, and reference may be made to FIG. 5B . At this time, the user terminal is a mobile phone.
其中,图5B中的(a),是将商品搜索功能以输入框的形式集成于手机的本地搜索功能之中。用户在需要时可以打开该功能。例如,可以将商品搜索功能放置于手机负一屏。当用户打开负一屏时,则开启商品搜索功能,并显示对应的输入框。Among them, (a) in FIG. 5B is that the commodity search function is integrated into the local search function of the mobile phone in the form of an input box. The user can turn on this feature when needed. For example, the product search function can be placed on the negative screen of the mobile phone. When the user opens the negative screen, the product search function is enabled, and the corresponding input box is displayed.
此时用户可以在输入框中输入商品文本,或者上传商品图片。手机在获取到用户输入的商品文本或者商品图片之后,则会将商品文本或者商品图片上传至商品管理系统内的服务器之中。At this point, the user can enter the product text in the input box, or upload the product image. After acquiring the commodity text or commodity picture input by the user, the mobile phone will upload the commodity text or commodity picture to the server in the commodity management system.
图5B中的(b),是将商品搜索功能以输入框的形式集成于手机的网页之中。用户可以在手机浏览器中访问该网页,并在网页输入框中输入商品文本,或者上传商品图片。网页在获取到用户输入的商品文本或者商品图片之后,则会将商品文本或者商品图片上传至商品管理系统内的服务器之中。(b) in FIG. 5B is the integration of the commodity search function in the web page of the mobile phone in the form of an input box. The user can visit the webpage in the mobile phone browser, and enter the product text in the webpage input box, or upload the product image. After acquiring the product text or product image input by the user, the webpage will upload the product text or product image to the server in the product management system.
在本申请实施例中,以用户输入了商品文本为例,来进行文本搜索的说明。其中,商品文本是指对商品相关的描述文本。其既可以是一段话,也可以是一些关键词。例如商品的名称、特点或品牌等。实际应用中,商品文本是用户根据自身对待搜索商品的已知情况输入的文本。因此商品文本的实际内容需根据实际应用场景确定。例如在一些可能场景中,可以是如“短裤”、“裙子”或“面包”等关键词,亦可以是如“5G全面屏手机,5000万四摄”等语句。In the embodiment of the present application, the text search is described by taking the user inputting the commodity text as an example. The product text refers to the description text related to the product. It can be either a paragraph or some keywords. For example, the name, characteristics or brand of the product. In practical applications, the commodity text is the text entered by the user according to the known situation of the searched commodity. Therefore, the actual content of the product text needs to be determined according to the actual application scenario. For example, in some possible scenarios, it may be keywords such as "shorts", "skirt" or "bread", or sentences such as "5G full-screen mobile phone, 50 million quad cameras".
S402,服务器根据商品文本,对数据库内各个商品的商品信息进行文本匹配,并筛选出文本匹配度最高的前n个商品的第一商品信息。其中,n为正整数。S402, the server performs text matching on the commodity information of each commodity in the database according to the commodity text, and filters out the first commodity information of the top n commodities with the highest text matching degree. where n is a positive integer.
在得到商品文本之后,本申请实施例中服务器会基于商品文本对数据库内各个商品的商品信息进行文本匹配。进而得到各个商品与商品文本的匹配度。其中,本申请实施例不对文本匹配的方法进行过多限定,可由技术人员根据实际需求设定。例如,可以采用基于语义分析的文本匹配方法,如一些基于神经网络的语义匹配模型。或者采用基于字符的文本匹配方法,如暴力(Brute Force,BF)算法、字符串匹配(Rabin-Karp,RK)算法和字符串查找(Knuth-Morris-Pratt,KMP)算法。After obtaining the commodity text, in this embodiment of the present application, the server will perform text matching on the commodity information of each commodity in the database based on the commodity text. Then, the matching degree between each commodity and the commodity text is obtained. The embodiments of the present application do not limit the text matching method too much, which can be set by technical personnel according to actual needs. For example, text matching methods based on semantic analysis, such as some neural network-based semantic matching models, can be used. Or use character-based text matching methods, such as Brute Force (BF) algorithm, string matching (Rabin-Karp, RK) algorithm and string search (Knuth-Morris-Pratt, KMP) algorithm.
考虑到实际应用中,对用户搜索反馈的商品数不宜过多。因此在计算出各个商品文本与各个商品信息的文本匹配度之后,本申请实施例会从中筛选出部分文本匹配度较高商品,并将对应的商品信息(即第一商品信息)作为匹配结果。其中,具体筛选的商品信息数量n此处不予限定,可由技术人员自行设定。例如可以设定为10~20,或者20~100中的任意值。Considering the practical application, the number of products that are fed back to the user's search should not be too many. Therefore, after calculating the text matching degree between each commodity text and each commodity information, the embodiment of the present application will filter out some commodities with high text matching degree, and use the corresponding commodity information (ie, the first commodity information) as the matching result. Wherein, the specific number n of product information to be screened is not limited here, and can be set by technical personnel. For example, it can be set to any value from 10 to 20, or from 20 to 100.
作为本申请的一个可选实施例,考虑到实际应用中数据库内存储的商品数量可能极多。此时直接进行文本匹配工作量较大。为了减少文本匹配的工作量,提高匹配效率。在本申请实施例中,会先对商品进行类目筛选。此时S402可以被替换为:S4021-S4022。As an optional embodiment of the present application, it is considered that the number of commodities stored in the database may be extremely large in practical applications. At this time, the workload of direct text matching is relatively large. In order to reduce the workload of text matching and improve the matching efficiency. In the embodiment of the present application, category screening of commodities will be performed first. At this time, S402 can be replaced with: S4021-S4022.
S4021,服务器对商品文本进行商品的类目识别,得到对应的第一类目。S4021, the server performs category identification of the commodity on the commodity text, and obtains the corresponding first category.
在本申请实施例中,会要求CP在商品数据中提供商品的类目属性数据。相应的,此时数据库存储的商品信息中,会记录有每个商品所述的类目。其中,本申请实施例不对类目的具体分类规则做过多限定,可由技术人员预先设定并告知CP。In this embodiment of the present application, the CP will be required to provide category attribute data of the commodity in the commodity data. Correspondingly, the category described for each commodity will be recorded in the commodity information stored in the database at this time. Wherein, the embodiment of the present application does not limit the specific classification rules of the categories too much, which can be preset by the technical personnel and notified to the CP.
服务器在接收到商品文本之后,首先会进行商品类目的识别,即确定用户所需搜索的商品具体属于那个类目。本申请实施例不对具体的类目识别方法进行限定,可由技术人员自行设定。例如可以采用关键词匹配的方法。即由技术人员提前设置好各个类目下常见的一些关键词。这些关键词可以以商品名词列表的方式进行记录。例如假设类目中包含“服装”。此时可以在“服装”类目下设置一些如“衣服”、“上衣”、“裤子”和“裙子”等相关的关键词。在获取到商品文本之后,再对商品文本进行关键词查找。并将查找到的关键词所属的类目,作为商品文本对应的类目(即第一类目)。After receiving the commodity text, the server will firstly identify the commodity category, that is, determine which category the commodity that the user needs to search for belongs to. The embodiment of the present application does not limit the specific category identification method, which can be set by the technical personnel. For example, a method of keyword matching can be used. That is, some common keywords under each category are set in advance by the technical staff. These keywords can be recorded in the form of a product noun list. For example, suppose the category contains "clothing". At this time, you can set some related keywords such as "clothes", "tops", "pants" and "skirts" under the "clothing" category. After the product text is obtained, keyword search is performed on the product text. The category to which the found keyword belongs is taken as the category corresponding to the commodity text (ie, the first category).
S4022,服务器根据商品文本,对数据库中第一类目下的商品,进行商品信息的文本匹配,并筛选出文本匹配度最高的至少一个商品的第一商品信息。S4022 , the server performs text matching of the commodity information for commodities under the first category in the database according to the commodity text, and filters out the first commodity information of at least one commodity with the highest text matching degree.
在确定出商品文本对应的类目之后,服务器仅会对数据库中该类目下的商品进行商品信息文本匹配。例如假设商品文本对应的类目为“服装”。此时服务器仅会对数据库中,“服装”类目下的商品进行商品信息文本匹配。并得到这些商品对应的文本匹配 度。After determining the category corresponding to the commodity text, the server will only perform commodity information text matching on the commodities under the category in the database. For example, suppose the category corresponding to the product text is "clothing". At this time, the server will only perform text matching of commodity information on commodities under the category of "clothing" in the database. And get the text matching degree corresponding to these products.
S403,服务器从数据库中获取第一商品信息内的属性数据,从NSP中获取第一商品信息关联的商品图片。根据获取到的属性数据和商品图片生成商品列表,并将商品列表发送至用户终端。S403, the server acquires attribute data in the first commodity information from the database, and acquires the commodity picture associated with the first commodity information from the NSP. Generate a product list according to the acquired attribute data and product pictures, and send the product list to the user terminal.
参考图5A,S403可以细化为S4031-S4033:Referring to Figure 5A, S403 can be refined into S4031-S4033:
S4031,服务器从数据库中获取第一商品信息内的属性数据。S4031, the server acquires attribute data in the first commodity information from the database.
S4032,服务器从NSP中获取第一商品信息关联的商品图片。S4032, the server acquires the commodity picture associated with the first commodity information from the NSP.
S4033,服务器根据获取到的属性数据和商品图片生成商品列表,并将商品列表发送至用户终端。S4033, the server generates a commodity list according to the acquired attribute data and commodity pictures, and sends the commodity list to the user terminal.
在筛选出商品信息之后,本申请实施例还会从数据库中下载这些商品信息内包含的属性数据,并从NSP中获取商品信息对应的商品图片。并会将获取到的属性数据和商品图片发送至用户终端。After the commodity information is filtered out, the embodiment of the present application further downloads attribute data contained in the commodity information from the database, and acquires commodity pictures corresponding to the commodity information from the NSP. And will send the acquired attribute data and product pictures to the user terminal.
其中应当说明地,商品信息中包含商品较多的属性数据。但实际应用中,一些属性对用户而言可能并不重要。例如假设商品信息中包含商品图片的下载地址。由于本申请实施例会从NSP中下载商品图片。因此下载地址对用户而言并不重要。基于这一原因,在本申请实施例中,下载的属性数据可以是商品信息内包含的部分或全部属性数据。具体包含的属性数据内容可由技术人员根据实际需求设定。例如可以设置为下载的属性数据包含商品的:名称、价格和链接。若商品信息内包含对商品的描述,亦可以作为下载的属性数据之一。其中,链接可以是网页链接、App链接和快应用链接中的任意一种或多种,用于跳转至对应的网页(包括Html5页面)、App页面或者快应用页面,实现商品的展示。在本申请实施例中,将链接指向的网页、App页面和快应用页面,统称为商品展示页面。It should be noted that the commodity information contains more attribute data of commodities. But in practice, some properties may not be important to the user. For example, suppose that the product information includes the download address of the product image. Because the embodiment of the present application will download the image of the product from the NSP. So the download address is not important to the user. For this reason, in this embodiment of the present application, the downloaded attribute data may be part or all of the attribute data contained in the commodity information. The specific content of the attribute data included can be set by the technical personnel according to the actual needs. For example, it can be set that the attribute data to be downloaded includes: name, price and link of the product. If the product information contains the description of the product, it can also be used as one of the downloaded attribute data. The link may be any one or more of a web page link, an App link, and a quick application link, which is used to jump to a corresponding web page (including an Html5 page), an App page, or a quick application page to display products. In the embodiments of the present application, the web pages, App pages, and quick application pages to which the links point are collectively referred to as commodity display pages.
可以理解地,本申请实施例并未对链接指向的商品展示页面所属的电商平台进行过多限定。理论上CP可以根据自身与不同电商平台的合作情况,来设置商品的链家。因此实际应用中,商品的链接所指向的商品展示页面,可以是一种或多种不同电商平台内的商品展示页面。在此基础上,用户可以根据实际需求来点击链接,从而跳转到不同电商平台的商品展示页面。或亦可以预先设置不同的链接优先级,并由用户终端自动调整到优先级较高的链接所指向的商品展示界面。It is understandable that the embodiment of the present application does not limit the e-commerce platform to which the commodity display page pointed to by the link belongs. In theory, CP can set up the chain home of goods according to its cooperation with different e-commerce platforms. Therefore, in practical applications, the product display page pointed to by the link of the product may be one or more product display pages in different e-commerce platforms. On this basis, users can click on the link according to their actual needs to jump to the product display page of different e-commerce platforms. Alternatively, different link priorities can be preset, and the user terminal can automatically adjust to the commodity display interface pointed to by a link with a higher priority.
例如,假设CP1在电商平台A和电商平台B中均销售商品A,即在电商平台A和电商平台B中具有相应的商品展示页面。同时电商平台A和电商平台B,均具有相应的网站、App和快应用。此时CP可以在商品数据中设置电商平台A和电商平台B在网站、App和快应用中分别对应的链接。即总共可以设置至少6条链接。For example, it is assumed that CP1 sells commodity A in both the e-commerce platform A and the e-commerce platform B, that is, there are corresponding commodity display pages in the e-commerce platform A and the e-commerce platform B. At the same time, e-commerce platform A and e-commerce platform B both have corresponding websites, apps and quick apps. At this time, the CP can set the corresponding links of the e-commerce platform A and the e-commerce platform B in the website, app and quick application respectively in the product data. That is, at least 6 links can be set in total.
在得到属性数据和商品图片之后,本申请实施例会以单个商品为单位进行属性数据和商品图片的排序。即先按照一定的规则对各个商品进行排序,并按照商品的顺序来对商品的属性数据和商品图片进行排序。在完成排序之后,将单个商品的属性数据和商品图片放置于同一行,且不同商品的属性数据和商品图片处于不同行,从而得到由排序后的属性数据和商品图片构成的商品列表。再将商品列表作为对商品文本的搜索结果,返回至用户终端。After the attribute data and the commodity pictures are obtained, the embodiment of the present application will sort the attribute data and the commodity pictures in a unit of a single commodity. That is, first sort each product according to certain rules, and then sort the attribute data and product pictures according to the order of the products. After the sorting is completed, the attribute data and the product image of a single product are placed in the same row, and the attribute data and product images of different products are in different rows, so as to obtain a product list composed of the sorted attribute data and product images. Then, the product list is returned to the user terminal as the search result of the product text.
S404,用户终端对商品列表进行显示。S404, the user terminal displays the commodity list.
用户终端在接收到商品列表之后,在屏幕中对商品列表进行显示。使得用户可以看到商品文本的搜索结果。其中,本申请实施例不对商品列表的展示方式做过多限定。可由技术人员根据需求自行设定。After receiving the commodity list, the user terminal displays the commodity list on the screen. Allows users to see the search results of the product text. Wherein, the embodiment of the present application does not limit the display manner of the commodity list too much. It can be set by technicians according to their needs.
作为本申请的一个可选实施例,可以针对商品列表中每个商品均生成一张卡片,并将商品列表中该商品的属性数据和商品图片,放置在同一卡片中进行显示。此时,可以在用户终端显示屏中显示各个商品一一对应的卡片。As an optional embodiment of the present application, a card may be generated for each commodity in the commodity list, and the attribute data and commodity picture of the commodity in the commodity list may be displayed on the same card. At this time, cards corresponding to each commodity one-to-one can be displayed on the display screen of the user terminal.
以一实例进行举例说明。可以参考图5C,在图5B中的(a)所示实施例的基础上。假设用户输入商品文本为“红酒杯”。搜索结果中包含4个商品,每个商品均具有商品名称、价格和链接三种属性数据,且具有对应的商品图片。此时本申请实施例针对每个商品均生成了一张卡片。同时会在卡片中显示商品图片和各个属性数据。Take an example to illustrate. Referring to FIG. 5C, on the basis of the embodiment shown in (a) of FIG. 5B. Suppose the user enters the product text as "wine glass". The search result contains 4 products, each product has three attribute data of product name, price and link, and has a corresponding product picture. At this time, the embodiment of the present application generates a card for each commodity. At the same time, the product image and various attribute data will be displayed in the card.
作为本申请的一个可选实施例,若商品列表中包含商品的链接。本申请实施例会将各个链接以控件的方式在卡片中显示。当检测到用户对链接的点击操作,则用户终端跳转到链接指向的商品展示页面。As an optional embodiment of the present application, if the commodity list includes links to commodities. In this embodiment of the present application, each link is displayed in a card in the form of a control. When detecting the user's click operation on the link, the user terminal jumps to the product display page pointed to by the link.
在实现对商品列表的显示基础上,若商品列表中包含商品的链接,且用户点击了该链接。则用户终端会打开链接对应的商品展示页面。当链接为网页链接时,是指启动浏览器,并打开用于商品展示的网页。当链接为App链接时,是指启动对应的App,并从App中打开用于商品展示的App页面。当链接为快应用链接时,则是指启动对应的快应用,并从App中打用于开商品展示的快应用页面。On the basis of realizing the display of the commodity list, if the commodity list contains the link of the commodity, and the user clicks the link. Then the user terminal will open the product display page corresponding to the link. When the link is a web page link, it means to start the browser and open the web page used for product display. When the link is an App link, it means to start the corresponding App and open the App page for product display from the App. When the link is a quick app link, it means to start the corresponding quick app, and open the quick app page from the app to open the product display.
以一实例进行举例说明。可以参考图5D,在图4C所示实例的基础上。假设用户点击了第一个商品卡片中的网页链接1(参考图5D中的(a))。此时用户终端会启动浏览器,并打开用于商品展示的网站页面(参考图5D中的(b))。此时用户可以在打开的网页中了解商品详情,并可以进行购买等操作。Take an example to illustrate. Reference may be made to FIG. 5D, on the basis of the example shown in FIG. 4C. It is assumed that the user clicks on the web page link 1 in the first commodity card (refer to (a) in FIG. 5D ). At this time, the user terminal will start the browser and open the website page for commodity display (refer to (b) in FIG. 5D ). At this point, the user can learn about the product details in the opened web page, and can make purchases and other operations.
作为本申请的另一个可选实施例,为了满足CP与不同电商平台的合作需求,以及对用户的使用体验。在本申请实施例中,可以由技术人员或者CP预先设置好不同链接之间的优先级。用户终端在接收到包含链接的商品列表之后,不对链接本身进行显示。可以参考图5E中的(a),在图5B中的(a)所示实施例的基础上。假设用户输入商品文本为“红酒杯”。搜索结果中包含4个商品,每个商品均具有商品名称、价格和链接三种属性数据,且具有对应的商品图片。此时本申请实施例针对每个商品均生成了一张卡片。同时会在卡片中显示商品图片,以及除链接以外的各个属性数据。As another optional embodiment of the present application, in order to meet the cooperation needs of CP and different e-commerce platforms, as well as the user experience. In this embodiment of the present application, the priority between different links may be preset by the technician or the CP. After the user terminal receives the commodity list containing the link, the link itself is not displayed. Reference may be made to (a) in FIG. 5E, on the basis of the embodiment shown in (a) of FIG. 5B. Suppose the user enters the product text as "wine glass". The search result contains 4 products, each product has three attribute data of product name, price and link, and has a corresponding product picture. At this time, the embodiment of the present application generates a card for each commodity. At the same time, the product image and various attribute data except the link will be displayed in the card.
在实现对商品列表的显示基础上,若商品列表中包含商品的链接,且用户点击了该商品对应的卡片。则用户终端会打开优先级最高的链接指向的商品展示页面。若打开失败,则会尝试打开优先级次高的链接指向的商品展示页面。以此类推,直至成功打开一个商品展示页面位置。On the basis of realizing the display of the commodity list, if the commodity list contains the link of the commodity, and the user clicks the card corresponding to the commodity. Then the user terminal will open the product display page pointed to by the link with the highest priority. If it fails to open, it will try to open the product display page pointed to by the link with the next highest priority. And so on, until a product display page position is successfully opened.
以一实例进行说明。假设技术人员预先设置的链接优先级从高到底为:电商平台App链接、快应用链接和网页链接。参考图5E中的(a),假设用户点击了第一个商品。此时用户终端会按照电商平台App链接、快应用链接和网页链接的顺序,依次判断是否有对应的链接。即若该商品有电商平台App链接。参考图5E中的(b),此时用户终端会启动电商平台App,并跳转到对应的页面。而若该商品仅有网页链接,则可以参考图5E中的(c)。此时用户终端会启动浏览器,并跳转到对应的页面。其中,若用 户终端中没有链接对应的App、快应用或浏览器,则会出现链接跳转失败。此时本申请实施例会重新选取一个优先级次高链接进行跳转。An example is used to illustrate. Assume that the link priorities preset by the technicians are from high to low: e-commerce platform App links, quick application links and web page links. Referring to (a) in FIG. 5E , it is assumed that the user clicks on the first commodity. At this time, the user terminal will determine whether there is a corresponding link in sequence according to the order of the e-commerce platform App link, the quick application link and the webpage link. That is, if the product has a link to the e-commerce platform App. Referring to (b) in FIG. 5E , at this time, the user terminal will start the e-commerce platform App and jump to the corresponding page. And if the product has only a web page link, you can refer to (c) in FIG. 5E . At this time, the user terminal will start the browser and jump to the corresponding page. Among them, if there is no App, quick application or browser corresponding to the link in the user terminal, the link jump will fail. At this time, in this embodiment of the present application, a link with the next highest priority is reselected for jumping.
在本申请实施例中,用户可以通过在用户终端输入商品文本的方式实现对商品的搜索。并可以在用户终端内查看到一个或多个搜索出的商品的属性数据。还可以根据自己的实际需求来查看商品展示页面。因此可以极大地方便用户的商品搜索,提高对商品曝光的效率。In this embodiment of the present application, the user can search for the commodity by inputting the commodity text on the user terminal. And the attribute data of one or more searched commodities can be viewed in the user terminal. You can also view the product display page according to your actual needs. Therefore, the user's product search can be greatly facilitated, and the efficiency of product exposure can be improved.
b、图片搜索。参考图6,图片搜索的流程,包括:b. Image search. Referring to Figure 6, the process of image search includes:
S501,用户终端将用户选取的商品图片上传至服务器。S501, the user terminal uploads the image of the product selected by the user to the server.
S501的操作与S401基本相同,因此具体的操作细节、原理和有益效果,均可以参考S401中的相关说明,此处不予赘述。The operation of S501 is basically the same as that of S401. Therefore, for specific operation details, principles and beneficial effects, reference may be made to the relevant description in S401, which will not be repeated here.
与S401不同之处在于:The difference from S401 is:
1、本申请实施例中,用户需要从用户终端选择一张本地图片作为商品图片上传至服务器。或者亦可以利用用户终端拍摄一张照片作为商品图片上传至服务器。1. In the embodiment of the present application, the user needs to select a local picture from the user terminal as a product picture to upload to the server. Alternatively, a user terminal may be used to take a photo and upload it to the server as a product image.
2、图片搜索的功能入口,理论上可以镶嵌于任何具有拍照或者图片浏览的功能之中。例如除了如图5B以外,也可以将图片搜索功能镶嵌于用户终端的相机功能之中。此时用户可以在日常对物品拍照之后,直接启用图片搜索功能来查询被拍物体对应的商品。同样可以将图片搜索功能镶嵌于用户终端的图库之中。此时用户在浏览图库的同时,可以根据需要启用图片搜索功能来查询图库中某张图片对应的商品。2. The function entrance of image search can theoretically be embedded in any function with photo taking or image browsing. For example, in addition to FIG. 5B , the picture search function can also be embedded in the camera function of the user terminal. At this time, the user can directly enable the image search function to query the product corresponding to the photographed object after taking pictures of the object in daily life. Similarly, the image search function can be embedded in the gallery of the user terminal. At this time, while browsing the gallery, the user can enable the image search function as required to query the product corresponding to a certain image in the gallery.
实际应用中,技术人员可以根据需求,将图片搜索的功能入口设置于用户终端的一个或多个功能之中。本申请实施例中,通过在不同功能中嵌入图片搜索的功能入口,一方面可以方便用户使用图片搜索,实现随时随地的“拍照购”。另一方面,可以增加对商品的曝光度,给商家和电商平台带来更多的流量。In practical applications, technicians can set the function entry of image search in one or more functions of the user terminal according to requirements. In the embodiment of the present application, by embedding the function entry of image search in different functions, on the one hand, it is convenient for users to use image search, and "photograph shopping" can be realized anytime and anywhere. On the other hand, it can increase the exposure of products and bring more traffic to merchants and e-commerce platforms.
S502,服务器对接收到的商品图片进行图像特征分析,得到第一图像特征数据。S502, the server performs image feature analysis on the received commodity picture to obtain first image feature data.
本申请实施例中,对用户上传的商品图片的图像特征分析方法,与对搜索前对已上传的商品图片的图像特征分析相同。因此对图像特征分析的操作可以参考对S301中的相关说明,此处不予赘述。In the embodiment of the present application, the method for analyzing the image features of the product pictures uploaded by the user is the same as the image feature analysis method for the uploaded product pictures before the search. Therefore, for the operation of the image feature analysis, reference may be made to the relevant description in S301, which will not be repeated here.
作为本申请的一个可选实施例,考虑到每种类目下的商品特征是具有一定共性的。例如同类目下商品的形状往往较为相似。因此为了提升图像特征分析的效果,使得得到图像特征数据可以更好地表征商品。在本申请实施例中,可以针对不同类目的商品预先设计不同的图像特征提取模型。在进行图像特征分析时,则先对商品图片进行类目识别(亦可称为意图分类),以确定出待搜索商品实际所属类目。再以此来选择对应的模型和分析。As an optional embodiment of the present application, it is considered that the characteristics of commodities under each category have certain commonalities. For example, the shapes of commodities in the same category are often similar. Therefore, in order to improve the effect of image feature analysis, the obtained image feature data can better characterize the product. In this embodiment of the present application, different image feature extraction models may be pre-designed for different categories of commodities. When performing image feature analysis, category identification (also referred to as intent classification) is first performed on the image of the product to determine the actual category of the product to be searched. Then use this to select the corresponding model and analysis.
在本申请实施例中,商品图片的类目识别,实质是对商品的自动分类。因此此处可以预先针对已知的各个商品类目,设置相应的类目分类模型。再利用该类目分类模型来实现对商品类目的分类识别。本申请实施例不对类目分类模型的模型种类和架构进行过多限定。可由技术人员根据实际需求来设定。In the embodiment of the present application, the category identification of the commodity pictures is essentially the automatic classification of the commodities. Therefore, a corresponding category classification model can be set in advance for each known commodity category. Then use the category classification model to realize the classification and identification of commodity categories. This embodiment of the present application does not limit the model type and architecture of the category classification model too much. It can be set by technicians according to actual needs.
作为本申请的一个可选实施例,在搜索前阶段,若S301使用了基于多模融合得到的图像特征分析模型进行图像特征分析(图像特征分析模型具体可参考S301中对应的实施例说明)。此时S502,也会使用与S301相同的图像特征分析模型对接收到的商品 图片进行图像特征分析,进而得到对应的图像特征数据(即第一图像特征数据)。As an optional embodiment of the present application, in the pre-search stage, if S301 uses the image feature analysis model obtained based on multimodal fusion to perform image feature analysis (for details of the image feature analysis model, please refer to the corresponding embodiment description in S301). At this time, in S502, the same image feature analysis model as in S301 is used to perform image feature analysis on the received image of the product, thereby obtaining corresponding image feature data (ie, first image feature data).
应当理解地,对商品图片进行商品检测的操作,亦可以适用于本申请实施例。因此此时可以将S3011-S3012应用至本申请实施例。相应的,此时S502可以被替换为:It should be understood that the operation of performing commodity detection on commodity pictures may also be applicable to the embodiments of the present application. Therefore, at this time, S3011-S3012 can be applied to the embodiments of the present application. Correspondingly, at this time, S502 can be replaced with:
S5021,服务器对接收到的商品图片进行商品检测,并根据检测结果截取出商品图像。S5021, the server performs commodity detection on the received commodity image, and intercepts the commodity image according to the detection result.
S5022,服务器对商品图像进行图像特征分析,并得到的第一图像特征数据。S5022, the server performs image feature analysis on the commodity image, and obtains first image feature data.
S5021-S5022的操作与S3011-S3012基本相同,因此具体的操作细节、原理和有益效果,均可以参考S3011-S3012中的相关说明,此处不予赘述。The operations of S5021-S5022 are basically the same as those of S3011-S3012, so the specific operation details, principles and beneficial effects can be referred to the relevant descriptions in S3011-S3012, which will not be repeated here.
S503,服务器根据第一图像特征数据,对特征库中的图像特征数据进行特征匹配。从特征库中筛选出特征匹配度最高的前n个第二图像特征数据,并确定前n个第二图像特征数据分别对应的n个商品。S503: The server performs feature matching on the image feature data in the feature library according to the first image feature data. The top n second image feature data with the highest feature matching degree are screened from the feature library, and n commodities corresponding to the top n second image feature data respectively are determined.
在得到对用户上传商品图片的图像特征数据(即第一图像特征数据)之后,本申请实施例会利用该图像特征数据对特征库中存储的图像特征数据进行特征匹配。并筛选出其中特征匹配度最高的前n个图像特征数据(即第二图像特征数据)。再将这些图像特征数据对应的商品,作为此次搜索出的目标商品。After obtaining the image feature data (ie, the first image feature data) of the product image uploaded by the user, the embodiment of the present application uses the image feature data to perform feature matching on the image feature data stored in the feature database. And screen out the top n image feature data (ie, the second image feature data) with the highest feature matching degree. Then, the products corresponding to these image feature data are used as the target products for this search.
其中,本申请实施例不对特征匹配的具体方法进行过多限定,可由技术人员根据实际需求设定。例如可以采用一些开源检索引擎实现特征匹配。如可以采用Faiss,其原理是计算图像特征相似度,然后根据相似度的高低返回商品个数。具体筛选的图像特征数据数量n此处不予限定,可由技术人员自行设定。例如可以设定为10~20,或者20~100中的任意值。The embodiments of the present application do not limit the specific method of feature matching too much, which can be set by technical personnel according to actual needs. For example, some open source search engines can be used to implement feature matching. For example, Faiss can be used, and its principle is to calculate the similarity of image features, and then return the number of products according to the similarity. The number n of image feature data to be specifically screened is not limited here, and can be set by the technical personnel. For example, it can be set to any value from 10 to 20, or from 20 to 100.
S504,从NSP中获取n个商品的商品图片,并从数据库中获取n个商品的属性数据。根据获取到的属性数据和商品图片生成商品列表,并将商品列表发送至用户终端。S504: Obtain product pictures of n products from the NSP, and obtain attribute data of the n products from the database. Generate a product list according to the acquired attribute data and product pictures, and send the product list to the user terminal.
参考图6,S504可以细化为S5041-S5043:Referring to Figure 6, S504 can be refined into S5041-S5043:
S5041,服务器从NSP中获取n个商品的商品图片。S5041, the server obtains commodity pictures of n commodities from the NSP.
S5042,服务器从数据库中获取n个商品的属性数据。S5042, the server obtains attribute data of n commodities from the database.
S5043,服务器根据获取到的属性数据和商品图片生成商品列表,并将商品列表发送至用户终端。S5043, the server generates a commodity list according to the acquired attribute data and commodity pictures, and sends the commodity list to the user terminal.
在确定出此次搜索出的n个商品之后,本申请实施例会从NSP中下载这些商品的商品图片。同时从数据库中下载n个商品的属性数据。再根据获取到的属性数据和商品图片来生成商品列表,并发送给用户终端。其中,对属性数据的下载操作,以及对商品列表的生成操作,与S406基本相同。具体可参考S406的相关说明,此处不予赘述。After the n commodities found in this search are determined, the embodiment of the present application will download commodity pictures of these commodities from the NSP. At the same time, the attribute data of n items are downloaded from the database. Then, a product list is generated according to the obtained attribute data and product pictures, and sent to the user terminal. Among them, the downloading operation of the attribute data and the generating operation of the commodity list are basically the same as S406. For details, please refer to the relevant description of S406, which will not be repeated here.
作为本申请的一个可选实施例,为了提高商品列表中各个商品的属性数据和商品图片排序的有效性。需尽可能地将相似度较高的商品的商品属性数据和商品图片排在前方。因此本申请实施例中,在S504中,“将商品列表发送至用户终端”的操作之前,服务器还可以对商品列表中各个商品的商品属性数据和商品图片,进行顺序重排(即重排序)。例如,考虑到实际应用中,商品颜色对于用户体验而言较为重要。此时可以根据商品的颜色,优先将商品列表中与用户终端上传的商品图片颜色相近的商品图片,以及该商品图片对应的属性数据,排在商品列表前方。As an optional embodiment of the present application, in order to improve the effectiveness of sorting the attribute data of each commodity in the commodity list and the commodity pictures. The product attribute data and product images of products with high similarity should be ranked first as much as possible. Therefore, in this embodiment of the present application, in S504, before the operation of "sending the product list to the user terminal", the server may also reorder the product attribute data and product pictures of each product in the product list (that is, reordering). . For example, considering practical applications, product color is more important for user experience. At this time, according to the color of the product, the product image in the product list with a similar color to the product image uploaded by the user terminal and the attribute data corresponding to the product image can be prioritized in the front of the product list.
作为本申请的重排序的一种可能实现方式,包括:As a possible implementation of the reordering of the present application, it includes:
S601,服务器对用户终端上传的商品图片进行商标检测,得到商品图片包含的第一商标信息。S601, the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains first trademark information included in the commodity picture.
S602,服务器对商品列表中的各张商品图片分别进行商标检测,得到这些商品图片包含的第二商标信息。S602: The server performs trademark detection on each commodity picture in the commodity list, respectively, to obtain second trademark information contained in these commodity pictures.
S603,服务器利用利用第一商标信息对各个第二商标信息进行信息匹配,并按照信息匹配度从高到低的顺序,对商品列表内各个商品的属性数据和商品图片进行排序。S603: The server uses the first trademark information to perform information matching on each second trademark information, and sorts the attribute data and product pictures of each commodity in the commodity list according to the order of the information matching degree from high to low.
其中,商标信息(包括第一商标信息和第二商标信息)包含商标名称和商标图案中的至少一种。具体可由技术人员根据实际需求设定。另外本申请实施例不对商标信息的检测方法做过多限定,可由技术人员自行设定。例如可以是基于神经网络模型的图像识别方法,亦可以是预设一些商标图像,进行图像匹配。Wherein, the brand information (including the first brand information and the second brand information) includes at least one of a brand name and a brand pattern. Specific can be set by technical personnel according to actual needs. In addition, the embodiments of the present application do not limit the detection method of trademark information too much, which can be set by technical personnel. For example, it can be an image recognition method based on a neural network model, or it can preset some trademark images for image matching.
本申请实施例在利用商品图片的图像特征数据进行图片特征匹配的基础上,还会利用商品图片内包含的商标信息进行二次匹配。并会根据二次匹配结果对已得到的商品列表重新进行排序。从而使得与用户待检索的商品相似度较高的商品,可以在用户终端中进行属性数据和商品图片的优先展示。In this embodiment of the present application, on the basis of using the image feature data of the product image to perform image feature matching, the trademark information contained in the product image is also used to perform secondary matching. And will re-sort the obtained product list according to the secondary matching results. As a result, a commodity with a high similarity to the commodity to be retrieved by the user can be preferentially displayed in the user terminal with attribute data and commodity pictures.
作为本申请的重排序的另一种可能实现方式,包括:Another possible implementation of the reordering of the present application includes:
S604,服务器对用户终端上传的商品图片进行商标检测,得到商品图片包含的第一商标信息。S604, the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains the first trademark information included in the commodity picture.
S605,服务器对根据获取到的属性数据,提取n个商品的第三商标信息。S605, the server extracts the third trademark information of n commodities according to the acquired attribute data.
S606,服务器利用利用第一商标信息对各个第三商标信息进行信息匹配,并按照信息匹配度从高到低的顺序,对商品列表内各个商品的属性数据和商品图片进行排序。S606, the server uses the first trademark information to perform information matching on each third trademark information, and sorts the attribute data and product pictures of each commodity in the commodity list according to the order of the information matching degree from high to low.
其中,商标信息(包括第一商标信息和第三商标信息)是商标名称。本申请实施例一方面会对商品图片进行商标检测,识别出商品图片中包含的商标的商标名称(即第一商标信息)。另一方面会从n个商品的属性数据中,搜索各个商品的商标名称(即第三商标信息)。再根据商标名称进行二次匹配。并会根据二次匹配结果对已得到的商品列表重新进行排序。从而使得与用户待检索的商品相似度较高的商品,可以在用户终端中进行属性数据和商品图片的优先展示。Wherein, the brand information (including the first brand information and the third brand information) is the brand name. On the one hand, in the embodiment of the present application, trademark detection is performed on the commodity picture, and the brand name (ie, the first trademark information) of the trademark contained in the commodity picture is identified. On the other hand, from the attribute data of n commodities, the brand name of each commodity (ie, the third brand information) is searched. A second match is made based on the brand name. And will re-sort the obtained product list according to the secondary matching results. As a result, a commodity with a high similarity to the commodity to be retrieved by the user can be preferentially displayed in the user terminal with attribute data and commodity pictures.
作为本申请的重排序的又一种可能实现方式,包括:As another possible implementation of the reordering of the present application, it includes:
S607,服务器对用户终端上传的商品图片进行商标检测,得到商品图片包含的第一商标信息。S607, the server performs trademark detection on the commodity picture uploaded by the user terminal, and obtains the first trademark information included in the commodity picture.
S608,服务器对商品列表中的各张商品图片分别进行商标检测,得到这些商品图片包含的第二商标信息。S608: The server performs trademark detection on each commodity picture in the commodity list, respectively, to obtain second trademark information contained in these commodity pictures.
S609,服务器对根据获取到的属性数据,提取n个商品的第三商标信息。S609, the server extracts the third trademark information of n commodities according to the acquired attribute data.
S6010,服务器利用利用第一商标信息,对n个商品的第二商标信息和第三商标信息进行信息匹配,并按照信息匹配度从高到低的顺序,对商品列表内各个商品的属性数据和商品图片进行排序。S6010, the server uses the first trademark information to perform information matching on the second trademark information and the third trademark information of the n commodities, and according to the order of the information matching degree from high to low, compares the attribute data and the attribute data of each commodity in the commodity list with the third trademark information. Product images are sorted.
在本申请实施例中,第一商标信息内包含商标名称,在此基础上,也可以同时包含商标图案。若第一商标信息内仅包含商标名称,则第二商标信息为商标名称。若第一商标信息内同时包含商标名称和商标图案,则第二商标信息内可以包含商标名称和 商标图案中的任意一种或多种。第三商标信息则为商标名称。In the embodiment of the present application, the first trademark information includes the brand name, and on this basis, the trademark pattern may also be included. If the first trademark information only contains the brand name, the second trademark information is the brand name. If the first trademark information contains both the brand name and the trademark pattern, the second trademark information may contain any one or more of the brand name and the trademark pattern. The third trademark information is the trademark name.
本申请实施例一方面会对商品图片进行商标检测,识别出商品图片中包含的第一商标信息。另一方面会从n个商品的属性数据中,搜索各个商品的商标名称(即第三商标信息),并对n个商品的商品图片进行第二商标信息的识别。再根据得到的三类商标信息来进行二次匹配,并根据二次匹配结果对已得到的商品列表重新进行排序。从而使得与用户待检索的商品相似度较高的商品,可以在用户终端中进行属性数据和商品图片的优先展示。On the one hand, in the embodiment of the present application, trademark detection is performed on the product image, and the first trademark information contained in the product image is identified. On the other hand, from the attribute data of the n products, the brand name of each product (ie, the third brand information) is searched, and the second brand information is identified for the product pictures of the n products. The second matching is performed according to the obtained three types of trademark information, and the obtained commodity list is reordered according to the second matching result. As a result, a commodity with a high similarity to the commodity to be retrieved by the user can be preferentially displayed in the user terminal with attribute data and commodity pictures.
其中,本申请实施例不对商标信息的匹配方法做过多限定,可由技术人员根据实际需求设定。例如在一些可选实施例中,可以一方面利用第一商标信息对各个第二商标信息进行匹配,得到n个商品对应的第一匹配度。另一方面利用第一商标信息对各个第三商标信息进行匹配,得到n个商品对应的第二匹配度。再基于第一匹配度和第二匹配度,确定出各个商品的最终匹配度(可以采用权重求和等方式进行处理),并作为匹配结果。The embodiments of the present application do not limit the matching method of trademark information too much, which can be set by technical personnel according to actual needs. For example, in some optional embodiments, each second trademark information may be matched by using the first trademark information on the one hand to obtain the first matching degree corresponding to the n commodities. On the other hand, each third trademark information is matched by using the first trademark information to obtain the second matching degree corresponding to the n commodities. Then, based on the first matching degree and the second matching degree, the final matching degree of each commodity is determined (may be processed by means of weight summation, etc.), and used as the matching result.
作为本申请的一个可选实施例,考虑到实际应用中可能会出现CP错误将同一商品的属性数据重复放置在同一商品数据之中的情况。例如一些仅是尺寸存在差别的上衣,若均放置在同一商品数据之中。此时单个商品可能同时对应有多个的属性数据。例如同一个上衣,仅是尺寸不同。若该商品的优先级较高,则可能会出现商品列表内重复有同一商品的属性数据。此时用户体验会有所下降。As an optional embodiment of the present application, considering that in practical applications, a CP error may occur and the attribute data of the same commodity is repeatedly placed in the same commodity data. For example, some tops that differ only in size are placed in the same product data. At this time, a single product may correspond to multiple attribute data at the same time. For example, the same top, only the size is different. If the product has a higher priority, the attribute data of the same product may be repeated in the product list. At this time, the user experience will be degraded.
为了应对上述情况,在将商品列表发送至用户终端之前,本申请实施例会对商品列表进行商品去重。即对商品列表中相同的商品,仅保留其中一个商品的属性数据和商品图片。并将其他商品的属性数据和商品信息均删除掉。此时可以实现对商品列表的去重更新,提高商品列表的有效性,以提高用户的体验。相应的,S505中展示的是去重更新后的商品列表。In order to cope with the above situation, before sending the commodity list to the user terminal, the embodiment of the present application will perform commodity deduplication on the commodity list. That is, for the same product in the product list, only the attribute data and product image of one product are retained. And delete the attribute data and product information of other products. At this time, the deduplication update of the commodity list can be realized, the effectiveness of the commodity list can be improved, and the user experience can be improved. Correspondingly, what is displayed in S505 is the list of commodities after deduplication and updating.
S505,用户终端对商品列表进行显示。S505, the user terminal displays the commodity list.
S505的操作与S404基本相同,因此具体的操作细节、原理和有益效果,均可以参考S404中的相关说明,此处不予赘述。The operation of S505 is basically the same as that of S404. Therefore, for specific operation details, principles and beneficial effects, reference may be made to the relevant description in S404, which will not be repeated here.
作为本申请的一个可选实施例,对图片搜索结果(即商品列表)的展示,以及对用户点击链接的响应方式。亦可以参考图5C至图5E,此时需将输入的数据由商品文本“红酒杯”,改为红酒杯的图片。As an optional embodiment of the present application, the display of the image search results (that is, the product list), and the way of responding to the user clicking on the link. Referring to FIGS. 5C to 5E , the input data needs to be changed from the commodity text "red wine glass" to a picture of a red wine glass.
作为本申请的一个可选实施例,参考图4B的在线流程部分,是一种搜索中对用户上传的商品图片进行图片搜索的方法流程示意图。说明如下:As an optional embodiment of the present application, referring to the online process part of FIG. 4B , it is a schematic flowchart of a method for performing image search on a product image uploaded by a user in a search. described as follows:
用户终端通过API向服务器上传商品图片。The user terminal uploads the product image to the server through the API.
服务器对商品图片进行商品检测,并根据检测结果截取出商品图像。The server performs commodity detection on the commodity image, and intercepts the commodity image according to the detection result.
服务器对商品图像进行类目识别,得到第二类目。The server performs category recognition on the commodity image to obtain the second category.
服务器基于第二类目,对商品图像进行图像特征分析,得到第一图像特征数据。Based on the second category, the server performs image feature analysis on the commodity image to obtain first image feature data.
服务器对第一图像特征数据进行数据压缩,得到压缩后的第一图像特征数据。The server performs data compression on the first image feature data to obtain compressed first image feature data.
服务器基于第一图像特征数据对特征库中的图像特征数据进行特征匹配,得到商品列表。The server performs feature matching on the image feature data in the feature library based on the first image feature data to obtain a product list.
服务器对商品列表进行重排序,得到排序后的商品列表。The server reorders the commodity list to obtain the sorted commodity list.
对排序后的商品列表进行商品去重,并将商品去重操作后的商品列表发送至用户终端。Commodity deduplication is performed on the sorted commodity list, and the commodity list after commodity deduplication operation is sent to the user terminal.
本申请实施例的各个步骤操作细节、原理和有益效果的说明。均可以参考参考图6所示实施例中的相关说明。此处不予赘述。Description of operation details, principles and beneficial effects of each step in the embodiments of the present application. Reference may be made to the relevant descriptions in the embodiment shown in FIG. 6 . It will not be repeated here.
在本申请实施例中,商品管理系统同时具备文本搜索和图片搜索功能。CP一次商品数据入库,可以实现多种商品分发渠道。为电商平台提供了更为方法的商品搜索功能,具有较高的实用价值。In the embodiment of the present application, the commodity management system has both text search and image search functions. CP stores commodity data once, and can realize various commodity distribution channels. It provides a more method product search function for the e-commerce platform, and has high practical value.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。还应理解的是,虽然术语“第一”、“第二”等在文本中在一些本申请实施例中用来描述各种元素,但是这些元素不应该受到这些术语的限制。这些术语只是用来将一个元素与另一元素区分开。例如,第一表格可以被命名为第二表格,并且类似地,第二表格可以被命名为第一表格,而不背离各种所描述的实施例的范围。第一表格和第二表格都是表格,但是它们不是同一表格。In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance. It will also be understood that, although the terms "first," "second," etc. are used in the text to describe various elements in some embodiments of the present application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first table could be named a second table, and similarly, a second table could be named a first table, without departing from the scope of the various described embodiments. The first table and the second table are both tables, but they are not the same table.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
本申请实施例还提供了一种服务器,所述服务器包括至少一个存储器、至少一个处理器以及存储在所述至少一个存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时,使所述服务器实现上述任意各个方法实施例中的步骤。An embodiment of the present application further provides a server, the server includes at least one memory, at least one processor, and a computer program stored in the at least one memory and executable on the at least one processor, the processing When the computer executes the computer program, the server is made to implement the steps in any of the foregoing method embodiments.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在服务器上运行时,使得服务器执行时可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product, when the computer program product runs on a server, the server can implement the steps in each of the above method embodiments when executed.
本申请实施例还提供了一种芯片系统,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行存储器中存储的计算机程序,以实现上述各个方法实施例中的步骤。An embodiment of the present application further provides a chip system, the chip system includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement the steps in the foregoing method embodiments .
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质等。The integrated modules/units, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-Only Memory (ROM) ), random access memory (Random Access Memory, RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使对应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions recorded in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in the within the scope of protection of this application.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围 并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that: the above are only the specific embodiments of the present application, but the protection scope of the present application is not limited to this, and any changes or replacements within the technical scope disclosed in the present application should be included in the present application. within the scope of protection of the application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (19)

  1. 一种商品数据管理方法,其特征在于,应用于服务器,所述方法包括:A commodity data management method, characterized by being applied to a server, the method comprising:
    获取商品数据,并将所述商品数据拆分为至少一个第一子文件,其中每个所述第一子文件中包含至少一个商品的属性数据;Obtain commodity data, and split the commodity data into at least one first sub-file, wherein each of the first sub-files contains attribute data of at least one commodity;
    对各个所述第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库。Perform attribute data verification on each of the first sub-files, and store the verified attribute data in a database.
  2. 根据权利要求1所述的商品数据管理方法,其特征在于,所述对各个所述第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库,包括:The commodity data management method according to claim 1, wherein the performing attribute data verification on each of the first sub-files, and storing the verified attribute data in a database, comprises:
    从所述至少一个第一子文件中选取出一个子文件作为第二子文件;A sub-file is selected from the at least one first sub-file as the second sub-file;
    对所述第二子文件进行属性数据校验,并将所述第二子文件中校验通过的属性数据上传至所述数据库;Perform attribute data verification on the second sub-file, and upload the verified attribute data in the second sub-file to the database;
    在完成对所述第二子文件的校验之后,返回执行所述从所述至少一个第一子文件中选取出一个子文件作为第二子文件的操作,直至所有所述第一子文件均被校验完成。After completing the verification of the second sub-file, return to executing the operation of selecting a sub-file from the at least one first sub-file as the second sub-file, until all the first sub-files are The verification is completed.
  3. 根据权利要求1或2任意一项所述的商品数据管理方法,其特征在于,还包括:The commodity data management method according to any one of claims 1 or 2, characterized in that, further comprising:
    若所述商品数据中存在校验失败的属性数据,则获取所述校验失败的属性数据的异常信息,并将所述异常信息存储至所述数据库。If there is attribute data that fails to be verified in the commodity data, the abnormal information of the attribute data that has failed to be verified is acquired, and the abnormal information is stored in the database.
  4. 根据权利要求1至3任意一项所述的商品数据管理方法,其特征在于,所述商品数据为数据表格式的数据。The commodity data management method according to any one of claims 1 to 3, wherein the commodity data is data in a data table format.
  5. 根据权利要求2至4任意一项所述的商品数据管理方法,其特征在于,所述将所述第二子文件中校验通过的属性数据上传至数据库,包括:The commodity data management method according to any one of claims 2 to 4, wherein the uploading the attribute data that has passed the verification in the second sub-file to the database comprises:
    在对所述第二子文件进行属性数据校验的过程中,将所述第二子文件中校验通过的属性数据上传至数据库;或者In the process of performing attribute data verification on the second sub-file, uploading the verified attribute data in the second sub-file to the database; or
    在对所述第二子文件进行属性数据校验完成后,将所述第二子文件中校验通过的属性数据上传至数据库。After the attribute data verification of the second sub-file is completed, the attribute data that has passed the verification in the second sub-file is uploaded to the database.
  6. 根据权利要求1至5任意一项所述的商品数据管理方法,其特征在于,所述商品数据内的属性数据中,包含商品图片下载地址,所述方法还包括:The commodity data management method according to any one of claims 1 to 5, wherein the attribute data in the commodity data includes a commodity image download address, and the method further comprises:
    根据所述校验通过的属性数据中包含的商品图片下载地址,下载商品图片;Download the product image according to the product image download address included in the attribute data that has passed the verification;
    对所述商品图片进行图像特征分析,得到图像特征数据;Perform image feature analysis on the commodity picture to obtain image feature data;
    将所述图像特征数据存储至特征库。The image feature data is stored in a feature library.
  7. 一种商品搜索方法,其特征在于,应用于服务器,所述方法包括:A commodity search method, characterized in that it is applied to a server, the method comprising:
    接收用户终端上传的第一商品图片;receiving the first product picture uploaded by the user terminal;
    对所述第一商品图片进行图像特征分析,得到第一图像特征数据;Perform image feature analysis on the first commodity picture to obtain first image feature data;
    从特征库存储的图像特征数据中,确定出与所述第一图像特征数据特征匹配度最高的至少一个第二图像特征数据;From the image feature data stored in the feature library, determine at least one second image feature data with the highest feature matching degree with the first image feature data;
    将与所述至少一个第二图像特征数据一一对应的第二商品图片,以及与所述第二商品图片关联的属性数据发送至所述用户终端,其中,发送的所述第二商品图片及关联的所述属性数据,是基于所述第一商品图片内包含的商标信息进行排序后的所述第二商品图片及所述属性数据。Send a second product image corresponding to the at least one second image feature data one-to-one and attribute data associated with the second product image to the user terminal, wherein the sent second product image and The associated attribute data is the second product image and the attribute data sorted based on the trademark information contained in the first product image.
  8. 根据权利要求7所述的商品搜索方法,其特征在于,在所述将与所述至少一个 第二图像特征数据一一对应的第二商品图片,以及与所述第二商品图片关联的属性数据发送至所述用户终端之前,还包括:The product search method according to claim 7, characterized in that, in the second product image corresponding to the at least one second image feature data one-to-one, and the attribute data associated with the second product image Before sending to the user terminal, the method further includes:
    获取所述第一商品图片内包含的第一商标信息;obtaining the first trademark information contained in the first product image;
    获取各个目标商品的目标商标信息,所述目标商品是所述第二图像特征数据所关联的商品,所述第二商品图片及关联的所述属性数据,是所述目标商品的商品图片和属性数据;Obtain the target trademark information of each target product, the target product is the product associated with the second image feature data, the second product image and the associated attribute data are the product image and attributes of the target product data;
    按照所述目标商标信息与所述第一商标信息的信息匹配度从高到低的顺序,对所述目标商品的所述第二商品图片和所述属性数据进行排序。The second product picture and the attribute data of the target product are sorted in descending order of the information matching degree between the target brand information and the first brand information.
  9. 根据权利要求8所述的商品搜索方法,其特征在于,所述目标商标信息,包括:第二商标信息和/或第三商标信息;The commodity search method according to claim 8, wherein the target trademark information includes: second trademark information and/or third trademark information;
    所述第二商标信息是所述目标商品关联的所述第二商品图片内包含的商标信息;The second trademark information is the trademark information contained in the second product image associated with the target product;
    所述第三商标信息是所述目标商品关联的所述属性数据内包含的商标信息。The third brand information is brand information included in the attribute data associated with the target product.
  10. 根据权利要求7至9任一所述的商品搜索方法,其特征在于,所述对所述第一商品图片进行图像特征分析,得到第一图像特征数据,包括:The commodity search method according to any one of claims 7 to 9, wherein the performing image feature analysis on the first commodity picture to obtain the first image feature data, comprising:
    利用预先训练完成的图像特征分析模型对所述第一商品图片进行图像特征分析,得到第一图像特征数据;所述图像特征分析模型是从基于多个商品样本的商品图片样本和属性数据样本训练得到的神经网络模型中,提取出的模型。Perform image feature analysis on the first product image by using the image feature analysis model that is pre-trained to obtain first image feature data; the image feature analysis model is trained from product image samples and attribute data samples based on multiple product samples In the obtained neural network model, the extracted model.
  11. 一种商品数据管理系统,其特征在于,包括:第一服务器、第二服务器和数据库;A commodity data management system, comprising: a first server, a second server and a database;
    所述第一服务器用于获取商品数据,并将所述商品数据拆分为至少一个第一子文件,其中每个所述第一子文件中包含至少一个商品的属性数据;The first server is configured to obtain commodity data, and split the commodity data into at least one first sub-file, wherein each of the first sub-files contains attribute data of at least one commodity;
    所述第二服务器用于对各个所述第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库。The second server is configured to perform attribute data verification on each of the first sub-files, and store the verified attribute data in a database.
  12. 根据权利要求11所述的商品数据管理系统,其特征在于,所述对各个所述第一子文件进行属性数据校验,并将校验通过的属性数据存储至数据库,具体包括:The commodity data management system according to claim 11, wherein the performing attribute data verification on each of the first sub-files, and storing the verified attribute data in a database, specifically includes:
    所述第二服务器从所述至少一个第一子文件中选取出一个子文件作为第二子文件;The second server selects a sub-file from the at least one first sub-file as the second sub-file;
    所述第二服务器对所述第二子文件进行属性数据校验,并将所述第二子文件中校验通过的属性数据上传至所述数据库;The second server performs attribute data verification on the second sub-file, and uploads the verified attribute data in the second sub-file to the database;
    所述第二服务器在完成对所述第二子文件的校验之后,返回执行所述获取所述至少一个第一子文件中的一个子文件的操作,直至所有所述第一子文件均被校验完成。After completing the verification of the second sub-file, the second server returns to execute the operation of acquiring one sub-file in the at least one first sub-file, until all the first sub-files are deleted. Verification is complete.
  13. 根据权利要求12所述的商品数据管理系统,其特征在于,在所述从所述至少一个第一子文件中选取出一个子文件作为第二子文件之前,还包括:The commodity data management system according to claim 12, characterized in that, before selecting a sub-file from the at least one first sub-file as the second sub-file, further comprising:
    所述第一服务器在所述数据库中创建与所述第一子文件一一对应的第一子任务;The first server creates, in the database, a first subtask corresponding to the first subfile one-to-one;
    所述第二服务器从所述至少一个第一子文件中选取出一个子文件作为第二子文件,包括:The second server selects one sub-file from the at least one first sub-file as the second sub-file, including:
    所述第二服务器从所述数据库存储的所述第一子任务中确定出一个第二子任务,并获取所述至少一个第一子文件中与所述第二子任务关联的一个子文件;The second server determines a second subtask from the first subtask stored in the database, and acquires a subfile associated with the second subtask in the at least one first subfile;
    所述第二服务器返回执行所述获取所述至少一个第一子文件中的一个子文件的操作,直至所有所述第一子文件均被校验完成,包括:The second server returns to perform the operation of acquiring one sub-file in the at least one first sub-file until all the first sub-files are verified, including:
    所述第二服务器返回执行从所述数据库存储的所述第一子任务中确定出一个第二子任务的操作,直至所有所述第一子任务均被执行完成。The second server returns to perform the operation of determining a second subtask from the first subtasks stored in the database until all the first subtasks are executed.
  14. 根据权利要求13所述的商品数据管理系统,其特征在于,所述从所述数据库存储的所述第一子任务中确定出一个第二子任务的操作,包括:The commodity data management system according to claim 13, wherein the operation of determining a second subtask from the first subtask stored in the database comprises:
    所述第二服务器向所述数据库发送任务查询请求;The second server sends a task query request to the database;
    所述数据库响应于接收到的所述任务查询请求,从所述第一子任务中筛选出待执行的子任务,并将所述待执行的子任务发送至所述第二服务器,所述待执行的子任务包括未执行的第一子任务,以及执行中且执行时长超出时长阈值的第一子任务;In response to the received task query request, the database filters out subtasks to be executed from the first subtasks, and sends the subtasks to be executed to the second server, and the subtasks to be executed are sent to the second server. The executed subtasks include the unexecuted first subtask, and the first subtask that is being executed and whose execution duration exceeds the duration threshold;
    所述第二服务器从接收到的所述待执行的子任务中,确定出所述第二子任务。The second server determines the second subtask from the received subtasks to be executed.
  15. 根据权利要求14所述的商品数据管理系统,其特征在于,所述第二服务器从接收到的所述待执行的子任务中,确定出所述第二子任务的操作,包括:The commodity data management system according to claim 14, wherein the second server determines the operation of the second subtask from the received subtasks to be executed, comprising:
    所述第二服务器依次向缓存组件请求对各个所述待执行的子任务的分布式锁;The second server sequentially requests the cache component for distributed locks for each of the subtasks to be executed;
    所述第二服务器在请求到对单个待执行的子任务的分布式锁时,将该待执行的子任务作为第二子任务。When the second server requests a distributed lock for a single subtask to be executed, the subtask to be executed is regarded as the second subtask.
  16. 根据权利要求15所述的商品数据管理系统,其特征在于,在对所述第二子文件进行属性数据校验的过程中,所述第二服务器还用于:The commodity data management system according to claim 15, wherein in the process of performing attribute data verification on the second sub-file, the second server is further configured to:
    判断对所述第二子文件的校验时长是否达到时长阈值;judging whether the verification duration of the second sub-file reaches the duration threshold;
    若对所述第二子文件的校验时长达到时长阈值,则释放对所述第二子文件的分布式锁。If the verification duration of the second subfile reaches the duration threshold, the distributed lock on the second subfile is released.
  17. 根据权利要求16所述的商品数据管理系统,其特征在于,还包括:所述缓存组件;The commodity data management system according to claim 16, further comprising: the cache component;
    所述缓存组件用于在将对所述第二子文件的分布式锁分配给所述第二服务器后,开始计时;The cache component is configured to start timing after allocating the distributed lock to the second sub-file to the second server;
    所述缓存组件还用于在计时时长达到时长阈值时,释放对所述第二子文件的分布式锁。The cache component is further configured to release the distributed lock on the second subfile when the timing duration reaches a duration threshold.
  18. 一种服务器,其特征在于,所述服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现根据权利要求1至6任一项所述方法的步骤。A server, characterized in that the server comprises a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the computer program according to claim 1 when the processor executes the computer program. The steps of any one of to 6.
  19. 一种芯片系统,其特征在于,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行存储器中存储的计算机程序,以实现如权利要求1至6任一项所述的商品数据管理方法。A chip system, characterized in that, the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory, so as to realize any one of claims 1 to 6 product data management method.
PCT/CN2021/116999 2020-10-23 2021-09-07 Commodity data management method and apparatus, and server WO2022083332A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011152745.1 2020-10-23
CN202011152745.1A CN114528343A (en) 2020-10-23 2020-10-23 Commodity data management method and device and server

Publications (1)

Publication Number Publication Date
WO2022083332A1 true WO2022083332A1 (en) 2022-04-28

Family

ID=81291605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116999 WO2022083332A1 (en) 2020-10-23 2021-09-07 Commodity data management method and apparatus, and server

Country Status (2)

Country Link
CN (1) CN114528343A (en)
WO (1) WO2022083332A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612180A (en) * 2023-07-18 2023-08-18 湖南省计量检测研究院 Commodity quantity detecting system capable of being maintained in real time

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115408277A (en) * 2022-08-29 2022-11-29 南京领行科技股份有限公司 Interface testing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765891A (en) * 2015-05-06 2015-07-08 苏州搜客信息技术有限公司 Searching shopping method based on pictures
CN105869048A (en) * 2016-03-28 2016-08-17 中国建设银行股份有限公司 Data processing method and system
CN105989043A (en) * 2015-02-04 2016-10-05 阿里巴巴集团控股有限公司 Method and device for automatically acquiring trademark in commodity image and searching trademark
CN109472534A (en) * 2018-11-15 2019-03-15 深圳市福尔科技有限公司 A kind of product name correlating method and system
CN110209643A (en) * 2019-04-23 2019-09-06 深圳壹账通智能科技有限公司 A kind of data processing method and device
CN111598535A (en) * 2020-05-09 2020-08-28 西安精雕软件科技有限公司 Basic material importing method and system and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989043A (en) * 2015-02-04 2016-10-05 阿里巴巴集团控股有限公司 Method and device for automatically acquiring trademark in commodity image and searching trademark
CN104765891A (en) * 2015-05-06 2015-07-08 苏州搜客信息技术有限公司 Searching shopping method based on pictures
CN105869048A (en) * 2016-03-28 2016-08-17 中国建设银行股份有限公司 Data processing method and system
CN109472534A (en) * 2018-11-15 2019-03-15 深圳市福尔科技有限公司 A kind of product name correlating method and system
CN110209643A (en) * 2019-04-23 2019-09-06 深圳壹账通智能科技有限公司 A kind of data processing method and device
CN111598535A (en) * 2020-05-09 2020-08-28 西安精雕软件科技有限公司 Basic material importing method and system and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612180A (en) * 2023-07-18 2023-08-18 湖南省计量检测研究院 Commodity quantity detecting system capable of being maintained in real time
CN116612180B (en) * 2023-07-18 2023-10-03 湖南省计量检测研究院 Commodity quantity detecting system capable of being maintained in real time

Also Published As

Publication number Publication date
CN114528343A (en) 2022-05-24

Similar Documents

Publication Publication Date Title
US11422853B2 (en) Dynamic tree determination for data processing
CN107436875B (en) Text classification method and device
US8738645B1 (en) Parallel processing framework
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
US8108360B2 (en) Database object update order determination
WO2022083332A1 (en) Commodity data management method and apparatus, and server
US20160019287A1 (en) Querying a database using relationship metadata
US11481412B2 (en) Data integration and curation
CN106687955B (en) Simplifying invocation of an import procedure to transfer data from a data source to a data target
US10719533B2 (en) Multi-tenant tables of a distributed database
US9268822B2 (en) System and method for determining organizational hierarchy from business card data
US20170109358A1 (en) Method and system of determining enterprise content specific taxonomies and surrogate tags
US20180107689A1 (en) Image Annotation Over Different Occurrences of Images Using Image Recognition
US20170060919A1 (en) Transforming columns from source files to target files
US8782785B2 (en) System, method and computer program product for rendering data of an on-demand database service safe
CN111274294A (en) Universal distributed heterogeneous data integrated logic convergence organization, release and service method and system
US10671626B2 (en) Identity consolidation in heterogeneous data environment
JP2023545945A (en) System and method for smart categorization of content in content management systems
CN114327374A (en) Business process generation method and device and computer equipment
US9619458B2 (en) System and method for phrase matching with arbitrary text
US10852926B2 (en) Filter of data presentations via user-generated links
CN111339214A (en) Automatic knowledge base construction method and system
CN110019456B (en) Data import method, device and system
US9659059B2 (en) Matching large sets of words
CN112131491B (en) Hierarchical ordering method, computing device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21881751

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21881751

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/09/2023)