CN107239399B - Index generation method, device and system for testing and readable storage medium - Google Patents

Index generation method, device and system for testing and readable storage medium Download PDF

Info

Publication number
CN107239399B
CN107239399B CN201710390849.8A CN201710390849A CN107239399B CN 107239399 B CN107239399 B CN 107239399B CN 201710390849 A CN201710390849 A CN 201710390849A CN 107239399 B CN107239399 B CN 107239399B
Authority
CN
China
Prior art keywords
data
list
index
search engine
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710390849.8A
Other languages
Chinese (zh)
Other versions
CN107239399A (en
Inventor
赵晶晶
李友科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710390849.8A priority Critical patent/CN107239399B/en
Publication of CN107239399A publication Critical patent/CN107239399A/en
Application granted granted Critical
Publication of CN107239399B publication Critical patent/CN107239399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an index generation method and device for search engine testing, a search engine testing system and a readable storage medium. The method comprises the following steps: acquiring a uniform resource locator list; acquiring an identification code list of data according to the uniform resource locator list; and generating a customized index according to the identification code list of the data. The method can ensure the test instruction of the search engine under the condition of greatly reducing the size of the index, and provides conditions for system test.

Description

Index generation method, device and system for testing and readable storage medium
Technical Field
The invention relates to the technical field of internet, in particular to an index generation method and device for search engine testing, a search engine testing system and a readable storage medium.
Background
With the increasing number of commodities in the e-commerce platform system, the requirement on the search engine is higher and higher, and a new search engine needs to be developed or the functions of the existing search engine need to be improved continuously so as to adapt to the increasing number of commodities in the platform. Before a new search engine or a new function of a search engine comes online, it needs to be tested, such as a smoke test, a regression test, a system test, and the like.
In existing search engine testing processes, a full-scale index is typically employed. The full index comprises all commodity data in the E-commerce platform, and all the commodity data are loaded into a memory of the test server during testing. However, this approach has the following drawbacks:
firstly, the test server starts to load the index for too long time, so that the speeds of the smoking test and the regression test cannot be guaranteed;
secondly, because the starting time of the server is too long, the system test which needs to restart the search server frequently cannot be carried out;
thirdly, the requirement for memory allocation of the test server is high due to the fact that the full index is too large.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of the above, the present invention provides an index generating method and apparatus for search engine testing, a search engine testing system and a readable storage medium, which can guarantee the test instruction of the search engine and provide conditions for system testing under the condition of greatly reducing the size of the index.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
According to an aspect of the present invention, there is provided an index generation method for search engine testing, including: acquiring a uniform resource locator list; acquiring an identification code list of data according to the uniform resource locator list; and generating a customized index according to the identification code list of the data.
According to some embodiments of the invention, obtaining the list of uniform resource locators comprises: extracting a plurality of search terms with highest search frequency; and obtaining a uniform resource locator list from the plurality of search terms.
According to some embodiments of the invention, obtaining the list of uniform resource locators comprises: and acquiring a uniform resource locator list according to the test requirements of the search engine.
According to some embodiments of the invention, the data comprises commodity data, and generating the customized index from the list of identification codes of the data comprises: acquiring information of each commodity data in an identification code list of the data; according to the classification in the information of each commodity data, acquiring a label field of each commodity data; and generating a customized index according to the information and the label field of each commodity data.
According to some embodiments of the present invention, generating the customized index according to the information and the tag field of each commodity data comprises: dividing all commodity data in the identification code list of the data into a plurality of hash fragments according to the identification code of each commodity data in the identification code list of the data, wherein each hash fragment contains the identification code of part of the commodity data in the identification code list of the data; distributing the plurality of hash fragments to a plurality of servers; in a plurality of servers, respectively generating a plurality of partial indexes according to the information and the label fields of the commodity data in the distributed hash fragments; and sorting the commodity data in the plurality of partial indexes to generate a customized index.
According to some embodiments of the invention, the method further comprises: and sending the customized index to the test equipment of the search engine so as to test the search engine according to the customized index.
According to another aspect of the present invention, there is provided an index generation apparatus for search engine testing, including: the locator list acquisition module is used for acquiring a uniform resource locator list; the identification code list acquisition module is used for acquiring an identification code list of data according to the uniform resource locator list; and the customized index generation module is used for generating a customized index according to the identification code list of the data.
According to some embodiments of the invention, the locator list acquisition module comprises: the search word extraction submodule is used for extracting a plurality of search words with the highest search frequency; and a first locator obtaining sub-module for obtaining a uniform resource locator list from the plurality of search terms.
According to some embodiments of the invention, the locator list acquisition module comprises: and the second locator acquiring submodule is used for acquiring the uniform resource locator list according to the test requirement of the search engine.
According to some embodiments of the invention, the customized index generation module comprises: the information acquisition submodule is used for acquiring information of each commodity data in the identification code list of the data; the field acquisition submodule is used for acquiring the label field of each commodity data according to the classification in the information of each commodity data; and the index generation submodule is used for generating the customized index according to the information and the label field of each commodity data.
According to some embodiments of the invention, the index generation submodule comprises: the Hash fragment dividing unit is used for dividing all the commodity data in the identification code list of the data into a plurality of Hash fragments according to the identification codes of all the commodity data in the identification code list of the data, and each Hash fragment contains the identification codes of part of the commodity data in the identification code list of the data; the hash fragment distribution unit is used for distributing the hash fragments to a plurality of servers; a partial index generating unit, configured to generate, in the multiple servers, multiple partial indexes according to the information and the tag fields of the commodity data in the distributed hash fragments, respectively; and a final index generating unit for sorting the commodity data in the plurality of partial indexes to generate a customized index.
According to some embodiments of the invention, the apparatus further comprises: and the customized index sending module is used for sending the customized index to the test equipment of the search engine so as to test the search engine according to the customized index.
According to still another aspect of the present invention, there is provided a search engine testing system including: a big data set server for generating a customized index according to any one of the methods described above; and the search engine test server is used for testing the search engine according to the customized index.
According to some embodiments of the invention, the big data set server is a Hadoop cluster server.
According to still another aspect of the present invention, there is provided a computer apparatus comprising: a memory, a processor, and executable instructions stored in the memory and executable in the processor, which when executed by the processor, implement any of the methods described above.
According to yet another aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement any of the methods described above.
According to the index generation method for the search engine test, the required uniform resource locator list is obtained, the identification code list of the corresponding data is obtained, and therefore the customized index is generated. Because the whole index does not need to be manufactured according to all data, the size of the index is greatly reduced, and the time for loading the index by the test server is reduced, so that the smoke test and the regression test can be quickly completed, and the code iteration is completed; no conditions are provided for the performance of the system test.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a block diagram of a search engine testing system, according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method for index generation for search engine testing, according to an exemplary embodiment.
FIG. 3 is a flow diagram illustrating another index generation method for search engine testing in accordance with an exemplary embodiment.
FIG. 4 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment.
FIG. 5 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment.
FIG. 6 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment.
FIG. 7 is a block diagram illustrating an index generation apparatus for search engine testing in accordance with an exemplary embodiment.
FIG. 8 is a block diagram illustrating another index generation apparatus for search engine testing in accordance with an illustrative embodiment.
FIG. 9 is a block diagram illustrating yet another index generation apparatus for search engine testing in accordance with an illustrative embodiment.
Fig. 10 is a block diagram illustrating yet another index generation apparatus for search engine testing in accordance with an exemplary embodiment.
FIG. 11 is a schematic diagram of a persistent integration platform shown according to an example.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
FIG. 1 is a block diagram of a search engine testing system, according to an exemplary embodiment. As shown in fig. 1, the system includes: a large data set (large data set) server 1 and a search engine test server 2.
The big data set server 1 is used for generating a customized index required by the search engine server 2 during testing. The big data set server 1 may be a Hadoop cluster server, or may be a single server, which is not limited in the present invention.
Hadoop is a distributed system infrastructure developed by the Apache foundation, and can enable users to develop distributed programs without knowing details of a distributed bottom layer, and make full use of the power of a cluster to carry out high-speed operation and storage. The most core design of the framework of Hadoop includes: HDFS (Hadoop Distributed File System, Distributed File System) and MapReduce. The HDFS provides storage for massive data, and the MapReduce provides calculation for massive data.
The search engine test server 2 may be a single server or a plurality of servers, thereby completing different tests such as a smoke test, a regression test, a system test, and the like.
FIG. 2 is a flow diagram illustrating a method for index generation for search engine testing, according to an exemplary embodiment. As shown in fig. 2, the method 10 includes:
in step S102, a uniform resource locator list is acquired.
A Uniform Resource Locator (URL) is a compact representation of a Resource and access method available from the internet, and is the address of a standard Resource on the internet. Before generating the index, the required list of uniform resource locators is first obtained.
In step S104, a list of identification codes of the data is obtained according to the list of uniform resource locators.
The data obtained according to the uniform resource locator list may include commodity data to be searched in the e-commerce platform, web page data to be searched by a web page search engine, file data to be searched by a file search engine, and the like, and the invention is not limited thereto.
After the data is acquired, the identification codes (ID) of the data are combined into an identification code list of the data.
The identification code of the data is used to uniquely represent each piece of data. For example, if the data is merchandise data, the identification code of the data is used to uniquely represent each merchandise to be sold in the e-commerce platform.
The search engine tests primarily by searching for specific words to detect whether the returned results are as expected. By using the index composed of the corresponding specific data obtained through the URL list, it can be ensured that each function point of the search engine is triggered by searching the corresponding word when the search engine is tested.
In step S106, a customized index is generated from the list of identification codes of the data.
And generating a customized index according to each piece of data in the identification code list of the data.
According to the index generation method for the search engine test, the required uniform resource locator list is obtained, the identification code list of the corresponding data is obtained, and therefore the customized index is generated. Because the whole index does not need to be manufactured according to all data, the size of the index is greatly reduced, and the time for loading the index by the test server is reduced, so that the smoke test and the regression test can be quickly completed, and the code iteration is completed; no conditions are provided for the performance of the system test.
It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
FIG. 3 is a flow diagram illustrating another index generation method for search engine testing in accordance with an exemplary embodiment. The steps shown in fig. 3 provide one specific implementation for step S102 in fig. 2. As shown in fig. 3, step S102 includes:
in step S1022, several search terms with the highest search frequency are extracted.
The most recent search frequency (TOP) search term may be extracted from the large dataset server 1 in fig. 1, for example, 1.5 ten thousand TOP search terms may be extracted for obtaining a URL list.
In step S1024, a URL list is acquired from the extracted search words.
The index is customized by acquiring the hot search word with the highest recent frequency, so that the test coverage of the search engine can be ensured, and the test quality is ensured.
FIG. 4 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment. The steps shown in fig. 4 provide yet another specific implementation for step S102 in fig. 2. As shown in fig. 4, step S102 includes:
in step S1022', a URL list is obtained according to the search engine test requirement.
By customizing the URL list according to the test requirements, different requirements for testing the search engine can be met, and therefore the test on each functional point in the search engine can be effectively triggered.
In some embodiments, the obtained URL list may include URLs obtained according to the method shown in fig. 3 and the method shown in fig. 4, so as to satisfy the requirements of test breadth and specificity at the same time.
FIG. 5 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment. The steps shown in fig. 5 provide one specific implementation for step S106 in fig. 2. In this embodiment, the data includes data of commodities to be sold in the e-commerce platform, and as shown in fig. 5, step S106 includes:
in step S1062, information of each item data in the identification code list of the data is acquired.
For example, information of each commodity data in the identification code list of the data, such as name, classification, price, number of comments, and the like, may be acquired from the Hive table of the large data set server 1. The Hive table is a data warehouse tool based on Hadoop, can map the structured data file into a database table, and provides SQL-like query function.
In step S1064, the tag field of each product data is acquired according to the classification in the information of each product data.
The information of the commodity data includes the classification of each commodity, such as clothing, electronic products, and the like. The label field information corresponding to each different category, such as the labels that can be included in the clothing commodity data, such as color, style, etc., and the labels that can be included in the electronic product, such as color, storage size, etc., is not included. These tag fields will determine the location of the item data during the search.
In step S1066, a customized index is generated based on the information of each product data and the tag field.
When the large data set server 1 is a Hadoop cluster server, the operation of generating the customized index may be implemented by a Hadoop MapReduce function in the large data set server 1, for example.
FIG. 6 is a flow diagram illustrating yet another index generation method for search engine testing in accordance with an exemplary embodiment. The steps shown in fig. 6 provide a specific implementation for step S1066 in fig. 5. As shown in fig. 6, step S1066 includes:
in step S10662, according to the identification code of each item data in the identification code list of data, all the item data in the identification code list of data are divided into a plurality of hash fragments, and each hash fragment contains the identification code of a part of item data in the identification code list of data.
When the large data set server 1 is a Hadoop cluster server, the size of the index needs to be controlled within a reasonable range due to the limitation of hardware of a single server, and therefore fragmentation processing needs to be performed on the index, that is, the index is generated in a distributed manner.
In step S10664, the plurality of hash slices are distributed to a plurality of servers.
And distributing the divided hash fragments to different single servers, thereby generating indexes which are suitable for the sizes of the hardware resources of the servers.
In step S10666, the plurality of servers generate a plurality of partial indexes based on the information of the product data in the assigned hash fragments and the tag field, respectively.
In step S10668, the product data in the plurality of partial indexes are sorted to generate a customized index.
And sequencing the commodity data in the plurality of partial indexes in a forward or reverse sequencing manner to obtain a final customized index which can be identified by a search program and provides search results externally.
The generation of the customized index can completely multiplex the production program of the full-scale index, so that when the program code for producing the full-scale index is changed, the customized index can be updated correspondingly in real time.
In some embodiments, the method 10 may further include a step S108 of sending the customized index to a test device of the search engine to test the search engine according to the customized index in the step S108.
After the customized index is generated, it is automatically pushed to the smoking test, regression test and system test environment, such as the search engine test server 2 shown in fig. 1, according to the test equipment list. During the push, the test to be or in progress is not affected. And after the push is successful, the old index is automatically deleted, and the disk availability of the test equipment is ensured.
The process of testing server startup, which is used for loading index 90% of the time, is mainly spent on reading index into memory, and the index is read and loaded one by one. The complete index is about 60GB, the process of reading the machine memory takes more than 15 minutes, and the process can be continuously increased along with the increase of the number of commodities. However, in the smoke test, the regression test and the abnormal test, the server is frequently required to be restarted, and each restart takes 30 minutes to perform the test. This is unacceptable in a frequently iterated development mode. The customized index can select specific commodity data according to the functional points to be tested and trigger the corresponding functional codes to complete the test. The customized index can be reduced to about 3GB, so that the server can be restarted within 2 minutes, and the testing efficiency is improved. This also allows efficient testing, as the merchandise data can trigger all functional points.
After the customized index is used for testing the stable code version, the complete index is finally used for carrying out comprehensive pressure test, so that the early time is effectively saved. For example, the problems that the new code cannot be started and the function is invalid can be quickly detected.
In the smoke test, basic procedure detection will be performed using the search term of TOP 20. In the regression test, all the used test URLs are added into the URL list of the production customized index, so the functions needing to be tested can be satisfied. In the system test, the management node test of the search system will be completed. The management node test mainly aims at the restarting, deleting and adding of each service node and comprises the start-stop operation of a large number of index servers. Because the test server can be started in about 1 minute, the system test is a precondition that the system test can be developed according to the requirement. The smoke test and the regression test are rapidly finished by using the customized index, and the code iteration is finished; and enables system testing to be conducted in search terms.
According to the index generation method for the search engine test, provided by the embodiment of the invention, on the premise of ensuring the test quality, the smoking test efficiency can be improved by more than 50%, and the starting time of the server is reduced to 1 minute and 20 seconds; the continuous integration process is perfected, so that the system test is completed; the server resources are saved, and the smoking test and the system test are replaced by 32GBDocker servers.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
FIG. 11 is a schematic diagram of a persistent integration platform shown according to an example. As shown in fig. 11, based on the system shown in fig. 1, an easy-to-use continuous integration platform can be established on the large data set server 1 and the search engine server 2 for monitoring problems in integration, providing detailed log files and reminding functions, and further, graphically displaying trends and stability of project construction. As shown in fig. 11, the persistent integration platform is used in the big data set server 1 to obtain TOP search words according to the data obtained from the big data set, generate URL lists, obtain information of commodity data, and trigger generation of the customized index. And deploys the customized index into the search engine Test server 2 to perform smoking Test, regression Test, system Test, and simulation environment Test (Staging Test) in the search engine Test server.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
FIG. 7 is a block diagram illustrating an index generation apparatus for search engine testing in accordance with an exemplary embodiment. As shown in fig. 7, the apparatus 20 includes: a locator list acquisition module 202, an identification code list acquisition module 204 and a customized index generation module 206.
The locator list obtaining module 202 is configured to obtain a uniform resource locator list.
The identifier code list obtaining module 204 is configured to obtain an identifier code list of data according to the uniform resource locator list.
The customized index generation module 206 is configured to generate a customized index according to the identifier code list of the data.
In some embodiments, the apparatus 20 may further comprise: and the customized index sending module 208 is configured to send the customized index to a test device of the search engine to test the search engine according to the customized index.
FIG. 8 is a block diagram illustrating another index generation apparatus for search engine testing in accordance with an illustrative embodiment. The difference from the apparatus 20 shown in fig. 7 is that the locator list acquiring module 302 of the apparatus 30 shown in fig. 8 includes: a search word extraction sub-module 3022 and a first locator acquisition sub-module 3024.
The search term extracting sub-module 3022 is configured to extract a plurality of search terms with the highest search frequency.
The first locator acquiring sub-module 3024 is configured to acquire a list of uniform resource locators from a plurality of search terms.
FIG. 9 is a block diagram illustrating yet another index generation apparatus for search engine testing in accordance with an illustrative embodiment. The difference from the apparatus 20 shown in fig. 7 is that the locator list acquiring module 402 of the apparatus 40 shown in fig. 9 includes: the second locator obtaining sub-module 4022, and the second locator obtaining sub-module 4022 is configured to obtain a uniform resource locator list according to a test requirement of the search engine.
Fig. 10 is a block diagram illustrating yet another index generation apparatus for search engine testing in accordance with an exemplary embodiment. The difference from the apparatus 20 shown in fig. 7 is that the customized index generation module 506 of the apparatus 50 shown in fig. 10 includes: information retrieval submodule 5062, field retrieval submodule 5064 and index generation submodule 5066.
The information obtaining sub-module 5062 is configured to obtain information of each commodity in the identification code list of the data.
The field acquisition sub-module 5064 is configured to acquire a tag field of each product data according to the classification in the information of each product data.
The index generation sub-module 5066 is configured to generate a customized index according to the information and the tag field of each commodity data.
In some embodiments, the index generation sub-module 5066 may include: the system comprises a hash fragmentation distribution unit, a partial index generation unit and a final index generation unit. The hash fragment distribution unit is used for distributing the plurality of hash fragments to a plurality of servers. The partial index generating unit is used for generating a plurality of partial indexes in the plurality of servers according to the information and the label fields of the commodity data in the distributed hash fragments. The final index generating unit is used for sorting the commodity data in the plurality of partial indexes to generate a customized index.
It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present invention.
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (7)

1. An index generation method for search engine testing, comprising:
acquiring a uniform resource locator list;
acquiring an identification code list of data according to the uniform resource locator list;
generating a customized index according to the identification code list of the data; and
sending the customized index to test equipment of a search engine so as to test the search engine according to the customized index;
wherein obtaining the uniform resource locator list comprises: extracting a plurality of search terms with highest search frequency; acquiring the uniform resource locator list from the plurality of search terms;
or, the obtaining the uniform resource locator list includes: acquiring the uniform resource locator list according to the test requirement of a search engine;
wherein the data comprises commodity data, and generating the customized index according to the identification code list of the data comprises: acquiring information of each commodity data in the identification code list of the data; according to the classification in the information of each commodity data, acquiring a label field of each commodity data; and generating the customized index according to the information and the label field of each commodity data.
2. The method of claim 1, wherein generating the customized index based on the information and tag fields of each item of merchandise data comprises:
dividing all commodity data in the identification code list of the data into a plurality of hash fragments according to the identification codes of all commodity data in the identification code list of the data, wherein each hash fragment contains the identification codes of part of commodity data in the identification code list of the data;
distributing the plurality of hash fragments to a plurality of servers;
in the plurality of servers, respectively generating a plurality of partial indexes according to the information and the label fields of the commodity data in the distributed hash fragments; and
and sorting the commodity data in the plurality of partial indexes to generate the customized index.
3. An index generation apparatus for search engine testing, comprising:
the locator list acquisition module is used for acquiring a uniform resource locator list;
the identification code list acquisition module is used for acquiring an identification code list of data according to the uniform resource locator list;
the customized index generation module is used for generating a customized index according to the identification code list of the data; and
the customized index sending module is used for sending the customized index to test equipment of a search engine so as to test the search engine according to the customized index;
wherein the locator list acquiring module comprises: the search word extraction sub-module is used for extracting a plurality of search words with the highest search frequency, and the first locator acquisition sub-module is used for acquiring the uniform resource locator list from the plurality of search words;
or, the locator list acquiring module includes: the second locator acquisition submodule is used for acquiring the uniform resource locator list according to the test requirement of the search engine;
wherein the data comprises commodity data, and the customized index generation module comprises: the system comprises an information acquisition submodule, a field acquisition submodule and an index generation submodule, wherein the information acquisition submodule is used for acquiring information of each commodity data in an identification code list of the data, the field acquisition submodule is used for acquiring a label field of each commodity data according to classification in the information of each commodity data, and the index generation submodule is used for generating the customized index according to the information of each commodity data and the label field.
4. A search engine testing system, comprising:
a big data set server for generating a customized index according to the method of any of claims 1-2; and
and the search engine test server is used for testing the search engine according to the customized index.
5. The system of claim 4, wherein the big data set server is a Hadoop cluster server.
6. A computer device, comprising: memory, processor and executable instructions stored in the memory and executable in the processor, characterized in that the processor implements the method according to any of claims 1-2 when executing the executable instructions.
7. A computer-readable storage medium having computer-executable instructions stored thereon, wherein the executable instructions, when executed by a processor, implement the method of any of claims 1-2.
CN201710390849.8A 2017-05-27 2017-05-27 Index generation method, device and system for testing and readable storage medium Active CN107239399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710390849.8A CN107239399B (en) 2017-05-27 2017-05-27 Index generation method, device and system for testing and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710390849.8A CN107239399B (en) 2017-05-27 2017-05-27 Index generation method, device and system for testing and readable storage medium

Publications (2)

Publication Number Publication Date
CN107239399A CN107239399A (en) 2017-10-10
CN107239399B true CN107239399B (en) 2020-06-05

Family

ID=59984638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710390849.8A Active CN107239399B (en) 2017-05-27 2017-05-27 Index generation method, device and system for testing and readable storage medium

Country Status (1)

Country Link
CN (1) CN107239399B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021505B (en) * 2017-12-05 2021-05-28 百度在线网络技术(北京)有限公司 Data online method and device and computer equipment
CN112579530B (en) * 2020-12-14 2024-05-14 莱诺斯科技(北京)股份有限公司 Data resource organization method and device of automatic test system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1963816A (en) * 2006-12-01 2007-05-16 清华大学 Automatization processing method of rating of merit of search engine
CN106168963A (en) * 2016-06-30 2016-11-30 北京金山安全软件有限公司 Real-time streaming data processing method and device and server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002821B2 (en) * 2013-01-16 2015-04-07 Google Inc. Indexing application pages of native applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1963816A (en) * 2006-12-01 2007-05-16 清华大学 Automatization processing method of rating of merit of search engine
CN106168963A (en) * 2016-06-30 2016-11-30 北京金山安全软件有限公司 Real-time streaming data processing method and device and server

Also Published As

Publication number Publication date
CN107239399A (en) 2017-10-10

Similar Documents

Publication Publication Date Title
JP5575902B2 (en) Information retrieval based on query semantic patterns
CN111008321B (en) Logistic regression recommendation-based method, device, computing equipment and readable storage medium
CN108228873A (en) Object recommendation, publication content delivery method, device, storage medium and equipment
CN110352427B (en) System and method for collecting data associated with fraudulent content in a networked environment
CN114911830B (en) Index caching method, device, equipment and storage medium based on time sequence database
CN108228799B (en) Object index information storage method and device
CN113220657B (en) Data processing method and device and computer equipment
CN106611029B (en) Method and device for improving search efficiency in website
CN107239399B (en) Index generation method, device and system for testing and readable storage medium
CN108039960B (en) Configuration information issuing method and server
CN108182200B (en) Keyword expansion method and device based on semantic similarity
CN116739626A (en) Commodity data mining processing method and device, electronic equipment and readable medium
CN109635072B (en) Public opinion data distributed storage method, public opinion data distributed storage device, storage medium and terminal equipment
US20160117352A1 (en) Apparatus and method for supporting visualization of connection relationship
CN113806647A (en) Method for identifying development framework and related equipment
CN107341105A (en) Information processing method, terminal and server
CN110737662A (en) data analysis method, device, server and computer storage medium
US20190294691A1 (en) System and method for top-k searching using parallel processing
CN105095416B (en) A kind of method and apparatus realizing content in the search and promoting
CN109614467B (en) Knowledge association and dynamic organization method and system based on fragment similarity
CN111008304B (en) Keyword generation method and device, storage medium and electronic device
CN108664646B (en) Audio and video automatic downloading system based on keywords
CN110609959A (en) Project life cycle-based retrieval method, storage medium and electronic device
CN111612548A (en) Information acquisition method and device, computer equipment and readable storage medium
CN113342391A (en) Code file indexing method and device based on version control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant