CN111209462A - Data processing method, device and equipment - Google Patents

Data processing method, device and equipment Download PDF

Info

Publication number
CN111209462A
CN111209462A CN202010002937.8A CN202010002937A CN111209462A CN 111209462 A CN111209462 A CN 111209462A CN 202010002937 A CN202010002937 A CN 202010002937A CN 111209462 A CN111209462 A CN 111209462A
Authority
CN
China
Prior art keywords
group
service device
index information
version information
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010002937.8A
Other languages
Chinese (zh)
Other versions
CN111209462B (en
Inventor
张晋玮
白雅雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010002937.8A priority Critical patent/CN111209462B/en
Publication of CN111209462A publication Critical patent/CN111209462A/en
Application granted granted Critical
Publication of CN111209462B publication Critical patent/CN111209462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the disclosure provides a data processing method, a device and equipment, wherein the method comprises the following steps: acquiring a request message, wherein the request message is used for requesting to acquire a related object of the first object, and the similarity between the related object and the first object is greater than or equal to a preset threshold value; determining latest version information corresponding to the current moment, and determining at least one target group in a plurality of groups of the distributed system based on the latest version information, wherein the version information of index information in service equipment in the target group is the latest version information, and the index information comprises characteristic information of objects in the service equipment; and requesting to obtain at least one reference object from the service equipment in at least one target group, and determining a related object in the at least one reference object based on the similarity between the reference object and the first object, wherein the similarity between the reference object and the first object is greater than or equal to a preset threshold value. The data processing efficiency is improved.

Description

Data processing method, device and equipment
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a data processing method, device and equipment.
Background
When a user searches for an object (e.g., audio, video, commodity information, etc.) in a network, after a target object is searched, the server may also recommend to the user a related object of the target object, for example, the related object may be an object having a high similarity to the target object.
In the related art, a server generally acquires key information corresponding to a target object, and matches the key information of the target object with key information of other objects in the server to determine a related object of the target object, and recommends the related object to a user. However, in an actual application process, the number of objects stored in the server is usually large, and a process of matching the target object with key information of other objects in the server one by one needs to consume a long time, so that the efficiency of determining a related object of the target object is low, that is, the efficiency of data processing performed by the server is low.
Disclosure of Invention
The embodiment of the disclosure provides a data processing method, a data processing device and data processing equipment, and improves data processing efficiency.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including:
acquiring a request message, wherein the request message is used for requesting to acquire a related object of the first object, and the similarity between the related object and the first object is greater than or equal to a preset threshold value;
determining latest version information corresponding to the current moment, and determining at least one target group in a plurality of groups of a distributed system based on the latest version information, wherein the version information of index information in service equipment in the target group is the latest version information, and the index information comprises characteristic information of objects in the service equipment;
requesting to obtain at least one reference object from the service equipment in the at least one target group, and determining the related object in the at least one reference object based on the similarity between the reference object and the first object, wherein the similarity between the reference object and the first object is greater than or equal to the preset threshold.
In one possible embodiment, determining at least one target group among a plurality of groups based on the latest version information includes:
acquiring version information of each group in the plurality of groups;
determining a group of which version information is the same as the latest version information as the at least one target group.
In a possible implementation, the requesting, from the serving device in the at least one target group, at least one reference object includes:
determining at least one target serving device in the at least one target group;
and requesting to acquire at least one reference object from the at least one target service device.
In a possible embodiment, determining the relevant object in the at least one reference object based on the similarity between the reference object and the first object includes:
sorting the at least one reference object according to the similarity of the reference object and the first object from high to low;
and determining the first N reference objects in the at least one sorted reference object as the related objects, wherein N is an integer greater than or equal to 1.
In one possible embodiment, the method further comprises:
acquiring the state of each service device in the first group;
and when the state of each service device in the first group is a finished state, reconstructing the index information corresponding to the first group.
In one possible embodiment, the master service device and the slave service device in the first group; reconstructing index information corresponding to the first group, including:
setting the state of a main service device in the first group as a creation state, so that the main service device creates and stores reconstruction index information corresponding to the first group;
after determining that the master service device completes creation of reconstruction index information corresponding to the first group, setting the state of the slave service device in the first group to an acquisition state, so that the slave service device acquires the reconstruction index information.
In one possible implementation, after determining that the master service device completes creating the reconstruction index information corresponding to the first group, the method further includes:
and setting the state of the main service equipment to be switched.
In a possible implementation manner, after setting the state of the slave service device in the first group to the acquisition state, the method further includes:
judging whether the slave equipment finishes acquiring the reconstruction index information or not;
and if so, setting the state of the slave service equipment to be switched.
In a possible implementation manner, after setting the state of the slave service device to the state to be switched, the method further includes:
when the states of the master service device and the slave service device are both to-be-switched states, switching the index information of the master service device and the slave service device to the reestablish index information;
and updating the version information of the first group into version information corresponding to the current moment.
In one possible embodiment, the index information is created based on a hierarchical navigable small world map HNSW algorithm.
In a second aspect, an embodiment of the present disclosure provides a data processing apparatus, including a first obtaining module, a first determining module, a second obtaining module, and a second determining module, wherein,
the first obtaining module is configured to obtain a request message, where the request message is used to request to obtain a related object of the first object, and a similarity between the related object and the first object is greater than or equal to a preset threshold;
the first determining module is configured to determine latest version information corresponding to a current time, and determine at least one target group in a plurality of groups of a distributed system based on the latest version information, where version information of index information in service devices in the target group is the latest version information, and the index information includes feature information of objects in the service devices;
the second obtaining module is configured to request the service device in the at least one target group to obtain at least one reference object;
the second determining module is configured to determine the relevant object in the at least one reference object based on a similarity between the reference object and the first object, where the similarity between the reference object and the first object is greater than or equal to the preset threshold.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring version information of each group in the plurality of groups;
determining a group of which version information is the same as the latest version information as the at least one target group.
In a possible implementation manner, the second obtaining module is specifically configured to:
determining at least one target serving device in the at least one target group;
and requesting to acquire at least one reference object from the at least one target service device.
In a possible implementation manner, the second determining module is specifically configured to:
sorting the at least one reference object according to the similarity of the reference object and the first object from high to low;
and determining the first N reference objects in the at least one sorted reference object as the related objects, wherein N is an integer greater than or equal to 1.
In a possible implementation, the apparatus further includes a reconstruction module, wherein the reconstruction module is configured to:
acquiring the state of each service device in the first group;
and when the state of each service device in the first group is a finished state, reconstructing the index information corresponding to the first group.
In one possible embodiment, the master service device and the slave service device in the first group; the reconstruction module is specifically configured to:
setting the state of a main service device in the first group as a creation state, so that the main service device creates and stores reconstruction index information corresponding to the first group;
after determining that the master service device completes creation of reconstruction index information corresponding to the first group, setting the state of the slave service device in the first group to an acquisition state, so that the slave service device acquires the reconstruction index information.
In a possible implementation manner, the rebuilding module is further configured to set the state of the main service device to be switched after the rebuilding module determines that the main service device completes creating the rebuilding index information corresponding to the first group.
In a possible implementation, after the rebuilding module sets the state of the slave service devices in the first group to the acquiring state, the rebuilding module is further configured to:
judging whether the slave equipment finishes acquiring the reconstruction index information or not;
and if so, setting the state of the slave service equipment to be switched.
In a possible implementation manner, after the rebuilding module sets the state of the slave service device to the state to be switched, the rebuilding module is further configured to:
when the states of the master service device and the slave service device are both to-be-switched states, switching the index information of the master service device and the slave service device to the reestablish index information;
and updating the version information of the first group into version information corresponding to the current moment.
In one possible embodiment, the index information is created based on a hierarchical navigable small world map HNSW algorithm.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the data processing method of any one of the first aspects.
In a fourth aspect, the present disclosure provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the data processing method according to any one of the first aspect is implemented.
According to the data processing method, the data processing device and the control equipment, after the control equipment acquires the request message for requesting to acquire the related object of the first object, the control equipment determines the latest version information corresponding to the current moment, determines at least one target group in a plurality of groups of the distributed system based on the latest version information, requests the service equipment in the at least one target group to acquire at least one reference object, and determines the related object in the at least one reference object based on the similarity between the reference object and the first object. In the process, the distributed system includes a plurality of groups, the service device in each group includes index information corresponding to different objects, so that the number of objects stored in each service device is small, the service device can quickly determine to obtain the reference object of the first object, the service device can search the reference object of the first object according to the stored index information, the index information includes characteristic information of the object, the service device can quickly determine to obtain the reference object of the first object according to the index information, the control device can quickly determine the related object of the first object, and the data processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is an architecture diagram of a distributed system provided by an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of another data processing method provided in the embodiment of the present disclosure;
fig. 4 is a schematic diagram of a reconstruction process of index information provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Fig. 1 is an architecture diagram of a distributed system provided by an embodiment of the present disclosure. Referring to fig. 1, the distributed system includes a control device and a plurality of groups (also referred to as groups), and each group includes at least one service device (also referred to as worker).
The control device is used for managing the service devices in the plurality of groups, and may also be referred to as a service router. The service device stores a plurality of objects and index information of each object, the objects can be audio, video, commodity information and the like, and the index information can include characteristic information of the objects. The objects and the index information of the objects stored in the server devices in the same group are the same, and the objects and the index information of the objects stored in the server devices in different groups are different. The index information may be created based on a hierarchical Navigable Small world map (HNSW) algorithm.
During the actual application process, the object in the service device may be updated, for example, the object in the service device may be updated periodically or in real time. The management server may periodically control the index information in the service devices in each group to be established, and when the related object of the first object needs to be determined, the control device may determine at least one target group among the plurality of groups in the distributed system based on the latest version information corresponding to the current time, request to acquire a reference object of the first object in the target group, and determine the related object of the first object in the reference object based on the similarity between the reference object and the first object. In the above process, the distributed system includes a plurality of groups, and the service device in each group includes index information corresponding to different objects, that is, the objects (data) are stored in different service devices in a fragmented manner, so that the number of the objects stored in each service device is small, the service device can quickly determine the reference object of the first object, and the service device can search the reference object of the first object according to the stored index information, where the index information includes characteristic information of the object, so that the service device can quickly determine the reference object of the first object according to the index information, and further, the control device can quickly determine the related object of the first object, thereby improving data processing efficiency.
Hereinafter, the technical means shown in the present disclosure will be described in detail by specific examples. It should be noted that the following embodiments may be combined with each other, and the description of the same or similar contents in different embodiments is not repeated.
Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present disclosure. Referring to fig. 2, the method may include:
s201, obtaining a request message, wherein the request message is used for requesting to obtain a related object of the first object.
The execution subject of the embodiment of the present disclosure may be a control device in a distributed system. For example, the control device may be the control device shown in fig. 1.
The first object may be audio, video, merchandise information, question and answer information, etc. Of course, the first object may be other objects, and the embodiment of the present disclosure is not particularly limited thereto.
The related object of the first object refers to an object with higher similarity to the first object, that is, the similarity of the related object to the first object is greater than or equal to a preset threshold value. For example, the preset threshold may be 80%, 90%, etc., and may be set according to actual needs.
Alternatively, the request message may come from the client, that is, the client sends the request message to the control device, for example, after the client requests to obtain the first object, the client sends the request message to the management server to request to obtain the relevant object of the first object.
Alternatively, the request message may also come from a server, which may be a device providing query service to the client, for example, the client requests to obtain the first object from the server, and after the server queries to obtain the first object required by the client, the server sends a request message to the control device to request to obtain the relevant object of the first object.
S202, determining latest version information corresponding to the current moment, and determining at least one target group in a plurality of groups of the distributed system based on the latest version information.
The version information of the index information in the service equipment in the target group is the latest version information, and the index information comprises the characteristic information of the object in the service equipment.
The index information shown in the embodiment of the present disclosure may be HNSW index information, that is, the index information shown in the present disclosure may be created based on an HNSW algorithm, for example, the index information may be created based on an HNSW C + + algorithm.
Different time periods may correspond to different version information, and the version information corresponding to the current time period at which the current time is located may be referred to as latest version information. The duration of one time period may be 1 hour, 10 hours, one day, etc., and the duration of one time period may be set according to actual needs.
For example, the object relationship between the period and the version information may be as shown in table 1:
TABLE 1
Time period Version information
…… ……
0-24 o' clock of 1 month and 1 day in 2019 V2019001
0-24 o' clock of 1 month and 2 days in 2019 V2019002
0-24 o' clock in 1 month and 3 days in 2019 V2019003
…… ……
Referring to table 1, the duration of a period is one day, for example, the version information corresponding to 0 point-24 point in 1 month and 1 day of the period 2019 is V2019001, the version information corresponding to 0 point-24 point in 1 month and 2 days of the period 2019 is V2019002, and the version information corresponding to 0 point-24 point in 1 month and 3 days of the period 2019 is V2019003. Assuming that the current time is 12 o' clock in 1 month, 3 months and 3 days in 2019, the latest version information corresponding to the current time is V2019003.
The service device of each group includes index information, and the index information is continuously updated (reconstructed), for example, the index information may be periodically reconstructed, or the reconstruction of the index information may be started next time after the reconstruction of one index information is completed. The index information has version information corresponding to the index information, the version information of the index information is version information corresponding to a time period in which the time when the update of the index information is completed, for example, if the update of the index information is completed at 3 o' clock of 1 month, 2 days and 3 months in 2019, the version information of the index information is V2019002.
In the actual application process, a certain time is needed for updating the index information, and therefore, in the updating process of the index information, the version information of the index information may be different from the latest version information. For example, at 0 point-24 point of 1/month 1 in 2019, the version information of the index information is V2019001, assuming that the index information starts to be updated at 0 point of 1/month 2 in 2019, and assuming that the update of the index information is completed at 1 point of 1/month 2/day 1 in 2019, the version information of the index information is V2019001 and the latest version information is V2019002 at 0 point-1 point of 1/month 2 in 2019, that is, the version information of the index information is different from the latest version information.
In a group, the version information of each service device is the same, the version information of the index information in a service device may be referred to as the version information of the service device, and the version information of the service device in a group may be referred to as the version information of the group. Because there may be multiple service devices in a group, and the update duration of the index information in each service device may be different, in order to make the version information of each service device in a group the same, after the update of the version information of each service device in a group is completed, the version information of the index information is adjusted, so as to adjust the version information of the service devices and the group.
The target group may be determined by acquiring version information of each of the plurality of groups, and determining a group having the same version information as the latest version information as the target group. The number of target groups may be one or more.
In the actual application process, the information of the service devices in the target group can also be acquired. In a short time, the target group and the service devices in the target group in the distributed system may not change, and therefore, after the obtained target group and the service devices in the target group are obtained, the target group and the service devices in the target group can be cached, so that when the target group and the service devices in the target group need to be obtained next time, only the service devices in the target group need to be obtained in the cache, and the processing efficiency is improved.
S203, at least one reference object is requested to be acquired from the service equipment in at least one target group.
Wherein the similarity between the reference object and the first object is greater than or equal to a preset threshold value.
At least one target service device may be first determined in at least one target group, and at least one reference object may be requested from the target service device, respectively. One target service device may be determined in one target group, for example, for any one target group, one service device with the smallest load in the target group may be determined as the target service device.
The control device may send a first request message to the target service device, where the first request message includes feature information of the first object, and the target service device matches the feature information of the first object with index information stored therein to obtain a reference object of the first object.
S204, determining related objects of the first object in the at least one reference object.
The at least one reference object may be ranked in order of similarity between the reference object and the first object from high to low, and the top N reference objects in the ranked at least one reference object may be determined as related objects, where N is an integer greater than or equal to 1.
In practical application, N is usually the same as the number of presentation bits for presenting the relevant object in the client. The presentation position refers to a position of a page shown by the client side for presenting the related object. For example, assuming that there are 5 display bits in the page displayed by the client to display the relevant video (relevant object) of the video (object), N is 5.
Optionally, after determining that the related object of the first object is obtained, the related object may be fed back to the client, for example, when the request message in S201 is from the client, the control device may send the related object of the first object to the client, and when the request message in S201 is from the server, the client may send the related object of the first object to the server.
According to the data processing method provided by the embodiment of the disclosure, after the control device acquires the request message for requesting to acquire the related object of the first object, the control device determines the latest version information corresponding to the current time, determines at least one target group in a plurality of groups of the distributed system based on the latest version information, requests the service device in the at least one target group to acquire at least one reference object, and determines the related object in the at least one reference object based on the similarity between the reference object and the first object. In the process, the distributed system includes a plurality of groups, the service device in each group includes index information corresponding to different objects, so that the number of objects stored in each service device is small, the service device can quickly determine to obtain the reference object of the first object, the service device can search the reference object of the first object according to the stored index information, the index information includes characteristic information of the object, the service device can quickly determine to obtain the reference object of the first object according to the index information, the control device can quickly determine the related object of the first object, and the data processing efficiency is improved.
The control device can manage the service devices in the distributed system and control the service devices to construct the index information.
The control device may manage the service device as follows: after the control device is started, the control device may obtain a directory (e.g., a psm directory) and a group number from a preset database (e.g., a mongo database), where device information of the service device is stored under the directory, and the control device may monitor the service device under the directory. For example, when a service device is newly added to the directory, the control device may assign group and role information to the service device, and the role information may be a master service device or a slave service device. Wherein, a group comprises a main service device and one or more slave service devices. The control device may create an indexcclusterinfo class and maintain information of the group and information of the service device in the distributed system through the indexcclusterinfo class. The control device may also create a class group info object for each group, and maintain information of the current group through the class group info object, where the information of the group may include: the service devices included in the group, the master service device in the group, the version information of the group, the latest version information of the current time, and the like. The control device may also create an object of class WorkerInfo for each service device, and maintain information of the current service device through the object of class WorkerInfo, for example, the information of the service device may include: version information of the service device, role information (master service device or slave service device) of the version information, a group to which the service device belongs, and the like. The data in the control device is stored persistently so that the control device can be loaded with the latest data after the control device is restarted. When the control device monitors that a certain service device is down, the control device may delete information of the service device, and if the service device is a main service device in a group, the control device further determines a new main service device in the group.
The controlling device controlling the service device to construct the index information may include: after the service device monitors a new service device in the distributed system and the group and orange information are allocated to the new service device, the control device controls the service device to create the index, for example, when the service device is a master service device, the control device may control the service device to create the index, and when the service device is a slave service device, the control device may control the service device to obtain the index created by the master service device in the group. In the process of operating the service device, the control device may further control the service device to perform index reconstruction, and a process of controlling the service device to perform index creation is described below with reference to fig. 3.
Fig. 3 is a schematic flow chart of another data processing method according to the embodiment of the present disclosure. Referring to fig. 3, the method may include:
s301, acquiring the state of each service device in the first group.
Any one of the distributed systems in the first group.
The state of the service device may include a creating state (creating), a fetching state (dispatching), a waiting to switch state (waitforswap), and a completing state (finished). The state of the main service equipment comprises a creating state, a to-be-switched state and a completing state, and the state of the slave service equipment comprises an acquiring state, a to-be-switched state and a completing state.
When the state of the main service device is set to the creation state, the main service device creates the index information. And when the main service equipment finishes creating the index information, setting the state of the main service equipment to be switched. And when the states of the master service equipment and the slave service equipment in the first group are both in the states to be switched, setting the state of the master service equipment to be in the completion state.
When the state of the slave service apparatus is set to the acquisition state, the slave service apparatus acquires the index information created by the master service apparatus. And when the slave service equipment acquires the index information created by the master service equipment, setting the state of the slave service equipment to be switched. And when the states of the master service equipment and the slave service equipment in the first group are both in a state to be switched, setting the state of the slave service equipment to be in a finished state.
And S302, when the state of each service device in the first group is the completion state, setting the state of the main service device in the first group as the creation state, so that the main service device creates and stores the reconstruction index information corresponding to the first group.
When the state of each service device in the first group is a complete state, the index information of both the master service device and the slave service device in the first group is described to be updated to the newly created index information.
After the control device sets the state of the master service device in the first group to the creation state, the master service device creates index information according to its state (creation state), for example, the master service device may create index information corresponding to the first group by the HNSW algorithm. After the service device creates the index information corresponding to the first group, the service device stores the index information corresponding to the first group. For example, the service device may store the index information corresponding to the first group in a Distributed File System (HDFS).
S303, after the main service device is determined to finish establishing the reconstruction index information corresponding to the first group, setting the state of the main service device to be switched.
S304, after the master service device is determined to complete the creation of the reconstruction index information corresponding to the first group, the state of the slave service device in the first group is set to be the acquisition state, so that the slave service device acquires the reconstruction index information.
After the state of the slave service device in the first group is set to the acquisition state, the slave service device starts to acquire the reconstruction index information created by the master service device. For example, the slave service apparatus may acquire reconstruction index information created by the master service apparatus from the HDFS.
S305, after the slave equipment is judged to finish acquiring the reconstruction index information, the state of the slave service equipment is set to be switched.
S306, when the states of the master service device and the slave service device are both to-be-switched states, switching the index information of the master service device and the slave service device into reestablishment index information.
When the states of the master service device and the slave service device are both to-be-switched states, it is indicated that the reestablishing index information exists in both the master service device and the slave service device, and therefore, the index information of the master service device and the index information of the slave service device can be switched to the reestablishing index information, and therefore, the consistency of the index information in the master service device and the index information in the slave service device can be ensured.
And S307, updating the version information of the first group into the version information corresponding to the current moment.
The version information of each service device (master service device and slave service device) in the first group and the version information of the index information in each service device may also be updated to the version information corresponding to the current time.
In the embodiment shown in fig. 3, the control device may control the service device to perform index reconstruction, and during the index reconstruction, it may be ensured that index information in the master service device and the slave service device in the group is consistent.
Next, with reference to fig. 4, by way of specific example. The method shown in the embodiment of fig. 3 will be explained.
Fig. 4 is a schematic diagram of a reconstruction process of index information provided in the embodiment of the present disclosure. Referring to fig. 4, it is assumed that one master service device and two slave service devices (denoted as slave service device 1 and slave service device 2, respectively) are included in one group.
In the actual application process, when it is determined that the states of the master service device and the two slave service devices are both complete states, the reconstruction of the index information is started. The method comprises the steps that firstly, the state of the main service equipment is set to be a creating state, the main service equipment starts to create reestablishing index information, after the reestablishing index information is created, the main service equipment stores the reestablishing index information to a database, and the control equipment sets the state of the main service equipment to be a state to be switched.
After the main service device stores the reconstruction index information into the database, the control device sets the states of the two slave service devices to be acquisition states, the slave service devices start to acquire the reconstruction index information from the database, and after the slave service devices acquire the reconstruction index information, the states of the slave service devices are set to be switched.
After the states of the master service device and the two slave service devices are set to be switched, the control device controls the master service device and the slave service devices to switch the index information to the rebuilt index information, that is, the index information in the master service device and the slave service devices is replaced by the rebuilt index information, and the version information of the group where the master service device and the slave service devices are located is updated to the version information corresponding to the current time.
Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure. Referring to fig. 5, the data processing apparatus 10 includes a first obtaining module 11, a first determining module 12, a second obtaining module 13, and a second determining module 14, wherein,
the first obtaining module 11 is configured to obtain a request message, where the request message is used to request to obtain a related object of the first object, and a similarity between the related object and the first object is greater than or equal to a preset threshold;
the first determining module 12 is configured to determine latest version information corresponding to a current time, and determine at least one target group in a plurality of groups of a distributed system based on the latest version information, where version information of index information in service devices in the target group is the latest version information, and the index information includes feature information of objects in the service devices;
the second obtaining module 13 is configured to request the service device in the at least one target group to obtain at least one reference object;
the second determining module 14 is configured to determine the relevant object in the at least one reference object based on a similarity between the reference object and the first object, where the similarity between the reference object and the first object is greater than or equal to the preset threshold.
The data processing apparatus provided in the embodiment of the present disclosure may execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
In a possible implementation, the first determining module 11 is specifically configured to:
acquiring version information of each group in the plurality of groups;
determining a group of which version information is the same as the latest version information as the at least one target group.
In a possible implementation manner, the second obtaining module 13 is specifically configured to:
determining at least one target serving device in the at least one target group;
and requesting to acquire at least one reference object from the at least one target service device.
In a possible implementation, the second determining module 14 is specifically configured to:
sorting the at least one reference object according to the similarity of the reference object and the first object from high to low;
and determining the first N reference objects in the at least one sorted reference object as the related objects, wherein N is an integer greater than or equal to 1.
Fig. 6 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present disclosure. On the basis of the embodiment shown in fig. 5, please refer to fig. 6, the data processing apparatus 10 further includes a reconstruction module 15, wherein the reconstruction module 15 is configured to:
acquiring the state of each service device in the first group;
and when the state of each service device in the first group is a finished state, reconstructing the index information corresponding to the first group.
In one possible embodiment, the master service device and the slave service device in the first group; the reconstruction module 15 is specifically configured to:
setting the state of a main service device in the first group as a creation state, so that the main service device creates and stores reconstruction index information corresponding to the first group;
after determining that the master service device completes creation of reconstruction index information corresponding to the first group, setting the state of the slave service device in the first group to an acquisition state, so that the slave service device acquires the reconstruction index information.
In a possible implementation manner, the rebuilding module 15 is further configured to set the state of the main service device to be switched after the rebuilding module 15 determines that the main service device completes creating the rebuilding index information corresponding to the first group.
In a possible implementation manner, after the rebuilding module 15 sets the state of the slave service devices in the first group to the acquiring state, the rebuilding module 15 is further configured to:
judging whether the slave equipment finishes acquiring the reconstruction index information or not;
and if so, setting the state of the slave service equipment to be switched.
In a possible implementation manner, after the rebuilding module 15 sets the state of the slave service device to the state to be switched, the rebuilding module 15 is further configured to:
when the states of the master service device and the slave service device are both to-be-switched states, switching the index information of the master service device and the slave service device to the reestablish index information;
and updating the version information of the first group into version information corresponding to the current moment.
In one possible embodiment, the index information is created based on a hierarchical navigable small world map HNSW algorithm.
The data processing apparatus provided in the embodiment of the present disclosure may execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 20 may be a terminal device or a server. Among them, the terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
Referring to fig. 7, the electronic device 20 may include a processing device (e.g., a central processing unit, a graphic processor, etc.) 21, which may perform various suitable actions and processes according to a program stored in a Read Only Memory (ROM) 22 or a program loaded from a storage device 28 into a Random Access Memory (RAM) 23. In the RAM 23, various programs and data necessary for the operation of the electronic apparatus 20 are also stored. The processing device 21, the ROM22, and the RAM 23 are connected to each other via a bus 24. An input/output (I/O) interface 25 is also connected to bus 24.
Generally, the following devices may be connected to the I/O interface 25: input devices 26 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 27 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 28 including, for example, magnetic tape, hard disk, etc.; and a communication device 29. The communication means 29 may allow the electronic device 20 to communicate wirelessly or by wire with other devices for exchanging data. While fig. 7 illustrates an electronic device 20 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 29, or installed from the storage means 28, or installed from the ROM 22. The computer program, when executed by the processing device 21, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present disclosure, and not for limiting the same; although embodiments of the present disclosure have been described in detail with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present disclosure.

Claims (13)

1. A data processing method, comprising:
acquiring a request message, wherein the request message is used for requesting to acquire a related object of a first object, and the similarity between the related object and the first object is greater than or equal to a preset threshold value;
determining latest version information corresponding to the current moment, and determining at least one target group in a plurality of groups of a distributed system based on the latest version information, wherein the version information of index information in service equipment in the target group is the latest version information, and the index information comprises characteristic information of objects in the service equipment;
requesting to obtain at least one reference object from the service equipment in the at least one target group, and determining the related object in the at least one reference object based on the similarity between the reference object and the first object, wherein the similarity between the reference object and the first object is greater than or equal to the preset threshold.
2. The method of claim 1, wherein determining at least one target group among a plurality of groups based on the latest version information comprises:
acquiring version information of each group in the plurality of groups;
determining a group of which version information is the same as the latest version information as the at least one target group.
3. The method according to claim 1 or 2, wherein requesting at least one reference object from a serving device in the at least one target group comprises:
determining at least one target serving device in the at least one target group;
and requesting to acquire at least one reference object from the at least one target service device.
4. The method according to any one of claims 1-3, wherein determining the relevant object in the at least one reference object based on the similarity of the reference object to the first object comprises:
sorting the at least one reference object according to the similarity of the reference object and the first object from high to low;
and determining the first N reference objects in the at least one sorted reference object as the related objects, wherein N is an integer greater than or equal to 1.
5. The method according to any one of claims 1-4, further comprising:
acquiring the state of each service device in the first group;
and when the state of each service device in the first group is a finished state, reconstructing the index information corresponding to the first group.
6. The method of claim 5, wherein the master service device and the slave service device in the first group; reconstructing index information corresponding to the first group, including:
setting the state of a main service device in the first group as a creation state, so that the main service device creates and stores reconstruction index information corresponding to the first group;
after determining that the master service device completes creation of reconstruction index information corresponding to the first group, setting the state of the slave service device in the first group to an acquisition state, so that the slave service device acquires the reconstruction index information.
7. The method of claim 6, after determining that the master service device completes creating the rebuild index information corresponding to the first group, further comprising:
and setting the state of the main service equipment to be switched.
8. The method according to claim 6 or 7, wherein after setting the status of the slave service devices in the first group to the acquisition status, further comprising:
judging whether the slave equipment finishes acquiring the reconstruction index information or not;
and if so, setting the state of the slave service equipment to be switched.
9. The method according to claim 8, further comprising, after setting the state of the slave service apparatus to the state to be switched:
when the states of the master service device and the slave service device are both to-be-switched states, switching the index information of the master service device and the slave service device to the reestablish index information;
and updating the version information of the first group into version information corresponding to the current moment.
10. The method according to any one of claims 1 to 9, wherein the index information is created based on a hierarchical navigable small world map HNSW algorithm.
11. A data processing apparatus comprising a first obtaining module, a first determining module, a second obtaining module, and a second determining module, wherein,
the first obtaining module is configured to obtain a request message, where the request message is used to request to obtain a related object of the first object, and a similarity between the related object and the first object is greater than or equal to a preset threshold;
the first determining module is configured to determine latest version information corresponding to a current time, and determine at least one target group in a plurality of groups of a distributed system based on the latest version information, where version information of index information in service devices in the target group is the latest version information, and the index information includes feature information of objects in the service devices;
the second obtaining module is configured to request the service device in the at least one target group to obtain at least one reference object;
the second determining module is configured to determine the relevant object in the at least one reference object based on a similarity between the reference object and the first object, where the similarity between the reference object and the first object is greater than or equal to the preset threshold.
12. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
execution of computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the data processing method of any of claims 1-10.
13. A computer-readable storage medium, having stored thereon computer-executable instructions, which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 10.
CN202010002937.8A 2020-01-02 2020-01-02 Data processing method, device and equipment Active CN111209462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010002937.8A CN111209462B (en) 2020-01-02 2020-01-02 Data processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010002937.8A CN111209462B (en) 2020-01-02 2020-01-02 Data processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN111209462A true CN111209462A (en) 2020-05-29
CN111209462B CN111209462B (en) 2021-05-18

Family

ID=70785680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010002937.8A Active CN111209462B (en) 2020-01-02 2020-01-02 Data processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN111209462B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113759362A (en) * 2021-07-28 2021-12-07 西安电子科技大学 Radar target data association method, device, equipment and storage medium
WO2023024461A1 (en) * 2021-08-27 2023-03-02 上海商汤智能科技有限公司 Index rebuild method, apparatus, and device, medium, chip, product, and program

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004778A (en) * 2010-11-19 2011-04-06 清华大学 Text index online updating method in cloud environment
CN102426589A (en) * 2011-10-31 2012-04-25 合一网络技术(北京)有限公司 Interlayer system used for searching database information and information searching method
JP2012247815A (en) * 2011-05-25 2012-12-13 Canon Inc Index setting method of document management system
CN105160039A (en) * 2015-10-13 2015-12-16 四川携创信息技术服务有限公司 Query method based on big data
CN106055622A (en) * 2016-05-26 2016-10-26 浪潮软件集团有限公司 Data searching method and system
CN106294695A (en) * 2016-08-08 2017-01-04 深圳市网安计算机安全检测技术有限公司 A kind of implementation method towards the biggest data search engine
CN108140031A (en) * 2015-10-02 2018-06-08 谷歌有限责任公司 Equity can synchronize storage system
CN108520002A (en) * 2018-03-12 2018-09-11 平安科技(深圳)有限公司 Data processing method, server and computer storage media
CN110505513A (en) * 2019-08-15 2019-11-26 咪咕视讯科技有限公司 A kind of video interception method, apparatus, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004778A (en) * 2010-11-19 2011-04-06 清华大学 Text index online updating method in cloud environment
JP2012247815A (en) * 2011-05-25 2012-12-13 Canon Inc Index setting method of document management system
CN102426589A (en) * 2011-10-31 2012-04-25 合一网络技术(北京)有限公司 Interlayer system used for searching database information and information searching method
CN108140031A (en) * 2015-10-02 2018-06-08 谷歌有限责任公司 Equity can synchronize storage system
CN105160039A (en) * 2015-10-13 2015-12-16 四川携创信息技术服务有限公司 Query method based on big data
CN106055622A (en) * 2016-05-26 2016-10-26 浪潮软件集团有限公司 Data searching method and system
CN106294695A (en) * 2016-08-08 2017-01-04 深圳市网安计算机安全检测技术有限公司 A kind of implementation method towards the biggest data search engine
CN108520002A (en) * 2018-03-12 2018-09-11 平安科技(深圳)有限公司 Data processing method, server and computer storage media
CN110505513A (en) * 2019-08-15 2019-11-26 咪咕视讯科技有限公司 A kind of video interception method, apparatus, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113759362A (en) * 2021-07-28 2021-12-07 西安电子科技大学 Radar target data association method, device, equipment and storage medium
CN113759362B (en) * 2021-07-28 2024-02-23 西安电子科技大学 Method, device, equipment and storage medium for radar target data association
WO2023024461A1 (en) * 2021-08-27 2023-03-02 上海商汤智能科技有限公司 Index rebuild method, apparatus, and device, medium, chip, product, and program

Also Published As

Publication number Publication date
CN111209462B (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN110909521B (en) Online document information synchronous processing method and device and electronic equipment
CN111209462B (en) Data processing method, device and equipment
CN111857720B (en) User interface state information generation method and device, electronic equipment and medium
CN112115217A (en) Data processing method and device for high-precision map, electronic equipment and storage medium
CN113347276B (en) Mobile access system based on GIS
US20150178393A1 (en) Specialized virtual personal assistant setup
CN112199923A (en) Identification generation method, system, device and medium based on distributed system
CN110134905B (en) Page update display method, device, equipment and storage medium
CN109614089B (en) Automatic generation method, device, equipment and storage medium of data access code
CN113886353B (en) Data configuration recommendation method and device for hierarchical storage management software and storage medium
CN113963763A (en) Partition changing method and device for medical data storage
CN113064704A (en) Task processing method and device, electronic equipment and computer readable medium
CN110598133A (en) Method, apparatus, electronic device, and computer-readable storage medium for determining an order of search items
CN111597439A (en) Information processing method and device and electronic equipment
CN113761075A (en) Method, device, equipment and computer readable medium for switching databases
CN110889055A (en) Interaction method, interaction system, electronic device and storage medium
CN111787043A (en) Data request method and device
CN111209479B (en) Object pushing method and device
CN110619093B (en) Method, apparatus, electronic device, and computer-readable storage medium for determining an order of search items
CN115314718B (en) Live broadcast data processing method, device, equipment and medium
CN110851192A (en) Method and device for responding to configuration of degraded switch
CN115221178B (en) Data table binding method, device, electronic equipment and computer readable medium
CN116360710B (en) Data storage method applied to server cluster, electronic device and readable medium
WO2024041566A1 (en) Information processing method and apparatus, and electronic device and storage medium
CN111294321B (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant