CN119853888B

CN119853888B - Method for reducing hash jitter caused by LAG member number change

Info

Publication number: CN119853888B
Application number: CN202510128243.1A
Authority: CN
Inventors: 陈朋; 树亚
Original assignee: Hefei Shenzhou Kuntai Information Technology Co ltd
Current assignee: Hefei Shenzhou Kuntai Information Technology Co ltd
Priority date: 2025-02-05
Filing date: 2025-02-05
Publication date: 2025-09-19
Anticipated expiration: 2045-02-05
Also published as: CN119853888A

Abstract

The invention discloses a method for reducing hash jitter caused by the change of the number of LAG members, which relates to the technical field of link aggregation and comprises the following steps of firstly, monitoring in real time; step two, early warning starting, step three, copy initial construction, step four, data migration, step five, node optimization and step six, switching verification. According to the method for reducing the hash jitter caused by the variation of the number of the LAG members, the virtual nodes are dynamically allocated, so that the data distribution mechanism of the consistent hash ring is optimized, when the hash mapping adjustment is caused by the variation of the number of the LAG members, the problem of data concentration or dispersion caused by simple redistribution is avoided, the migration and rerouting of a large amount of data are avoided, the interference of the hash jitter on the stability of data transmission is reduced to the greatest extent, the continuity of data transmission is ensured, the packet loss rate is reduced, and reliable support is provided for the service with high real-time requirements.

Description

Method for reducing hash jitter caused by LAG member number change

Technical Field

The invention relates to the technical field of link aggregation, in particular to a method for reducing hash jitter caused by the change of the number of LAG members.

Background

The M-LAG (Multichassislinkaggregation, cross-device link aggregation) virtualizes two physical devices into a virtual M-LAG system access network at the aggregation level, and cross-device link aggregation is realized, so that device-level redundancy protection and traffic load sharing are provided.

In addition, besides the IPL link, the two devices also have a keep-alive link, which is used for detecting the state of the neighbor, namely, the keep-alive message is interacted to execute MAD (MultiActiveDetection, multiple activation detection) when the peer-link fails, so that the two M-LAG devices are prevented from running in the role of the main device.

The invention with the publication number of CN119070972A discloses a method and equipment for preventing multi-level cross-device link aggregation connection hash polarization, and as shown in the invention, the existing method comprises the steps of recording a root node hash algorithm, setting the root node hash algorithm for each validated cross-device link aggregation group, detecting the polarized cross-device link aggregation group, screening a new hash algorithm for the polarized cross-device link aggregation group, and setting the new hash algorithm for the polarized cross-device link aggregation group. However, the conventional algorithm often causes a hash jitter problem when the number of LAG members changes, which may cause unstable data transmission, reduced efficiency, unbalanced load, and the like.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for reducing hash jitter caused by the change of the number of LAG members, and solves the existing problems.

In order to achieve the purpose, the invention is realized through the following technical scheme that the method for reducing hash jitter caused by the change of the number of LAG members comprises the following steps:

the method comprises the steps of monitoring in real time, namely, utilizing a link heartbeat and port polling technology to look at the status of the LAG member in all weather, and accurately capturing the dynamics such as joining, exiting or failure of the member;

Step two, early warning is started, namely early warning is triggered immediately when the number of the members is perceived to be variable, a manager is informed, a hash adjustment flow is started, non-key data circulation is suspended, and adjustment time is reserved;

step three, copy initial construction, namely evaluating the bearing potential of the new member according to the bandwidth and delay characteristics of the new member, constructing a temporary hash table copy, and preliminarily planning the data slot layout;

Step four, data migration, namely sorting the original hash table data according to a preset priority, preferentially moving hot spot and high-priority service data to a new member slot corresponding to the temporary copy in a small-batch and multi-batch mode, checking the integrity and consistency of the data batch by batch, and marking that migration is completed after confirmation;

Step five, node optimization, namely recalculating the virtual node distribution of the consistency hash ring according to the configuration of new LAG members, inserting the newly added members into the nodes as required and migrating partial nodes adjacent to the old members;

and step six, switching verification, namely starting a new hash table after the temporary copy data is migrated and the hash ring is stable, recovering data transmission, continuously observing indexes such as throughput, delay, packet loss rate and the like, comparing verification, and timely rolling back the original hash table to re-analyze and adjust when the temporary copy data is abnormal.

Preferably, in the first step, the system monitors the status of LAG members in real time, and timely discovers the joining, exiting or failure of the members through technologies such as link heartbeat detection, port status polling and the like.

Preferably, in the second step, once the number change is detected, an early warning mechanism is triggered immediately, a notification is sent to a system administrator, a hash adjustment flow is started, and meanwhile, transmission of part of non-critical data is suspended, so that a time window is strived for adjustment.

Preferably, in the third step, a temporary hash table copy is constructed, the carrying capacity of the new member is estimated according to the characteristics (such as bandwidth, delay, etc.), and the data slots are primarily allocated.

Preferably, in the fourth step, the data in the original hash table are ordered according to a preset priority rule, the hot spot data and the high priority service data are preferentially migrated to the slots of the new members corresponding to the temporary copies in a small batch and multiple batches mode, and each time a batch is migrated, the integrity and consistency of the data are verified, and the fact that the data are migrated is ensured after no errors.

Preferably, in the fifth step, the distribution of the virtual nodes is recalculated on the consistent hash ring according to the configuration of the new LAG member, for the newly added member, a corresponding number of virtual nodes are reasonably inserted on the ring according to the performance parameters of the newly added member, and the load of partial virtual nodes adjacent to the old member is transferred in a balanced manner, for the exiting member, the data carried by the corresponding virtual node is smoothly transferred to the virtual nodes of other surviving members, and the stability of the hash ring and the uniformity of the data distribution are maintained by dynamically adjusting the mapping relation between the virtual nodes and the actual LAG members.

Preferably, in the sixth step, after the data migration in the temporary copy is completed and the hash ring is reconstructed stably, the new hash table is switched, the new hash table is formally put into use, the suspended data transmission is resumed, the transmission performance of the data including the indexes such as throughput, delay and packet loss rate is continuously monitored in a period of time after the switching, the data before switching is compared, the inhibiting effect of the algorithm on the hash jitter is verified, if the abnormality is found, the original hash table is rolled back in time, and the analysis and adjustment are repeated.

Preferably, in the third step, when the initial number of virtual nodes is allocated to the LAG member, or the number of virtual nodes is dynamically adjusted according to the member performance, a weighted calculation mode is adopted, the bandwidth of the member is set to be B, the processing capacity is set to be P, the delay is set to be D, and the corresponding weights are respectively set to be、、The calculation formula of the comprehensive score S is as follows:

Based on the composite score S, in the preset virtual node number range [ ,Determining the number N of virtual nodes in the tree, and setting the virtual nodes as a linear relation:

;

Where k and b are constants predetermined based on system experience and demand.

Preferably, in the fourth step, when the data is migrated from the original hash table to the temporary copy, the data is prioritized according to the traffic of the data(E.g. measured by number of accesses over a period of time, amount of data transferred) and recent access frequencyDetermining migration order (e.g., number of accesses in past hour), prioritizing integrated migration of data item iThe calculation formula is as follows:

;

In the formula, AndThe method comprises the steps of balancing importance of traffic priority and recent access frequency for preset weight coefficients;

According to Is to sort the data in descending order, and preferentially migrateData items with high values.

Preferably, in the fifth step, when the number of members changes and the load needs to be redistributed, for example, the virtual node load is transferred to other surviving members after the members exit, and a weighted polling or similar strategy is adopted to set the surviving member set as m= {,,,Member(s)The current load carrying capacity (determined by the combination of bandwidth, remaining processing resources, etc.) of (a) isThe load to be allocated is L, and the load is allocated to the memberIs of the loading of (2)The calculation formula is as follows:

advantageous effects

The invention provides a method for reducing hash jitter caused by the change of the number of LAG members. Compared with the prior art, the method has the following beneficial effects:

1. According to the method for reducing the hash jitter caused by the variation of the number of the LAG members, the virtual nodes are dynamically allocated, so that the data distribution mechanism of the consistent hash ring is optimized, when the hash mapping adjustment is caused by the variation of the number of the LAG members, the problem of data concentration or dispersion caused by simple redistribution is avoided, the migration and rerouting of a large amount of data are avoided, the interference of the hash jitter on the stability of data transmission is reduced to the greatest extent, the continuity of data transmission is ensured, the packet loss rate is reduced, and reliable support is provided for the service with high real-time requirements.

2. According to the method for reducing hash jitter caused by the change of the number of the LAG members, the improved consistency hash ring is introduced, a plurality of virtual nodes are distributed for each LAG member on the ring, the number of the virtual nodes is dynamically distributed according to the bandwidth, the processing capacity and other attributes of the members, the members with large bandwidth and strong processing capacity correspond to more virtual nodes, when the number of the members changes, the load is balanced by adjusting the distribution of the virtual nodes, and the hash value is not simply redistributed, so that data can still be distributed uniformly under a new LAG structure, and the problem of hash concentration or dispersion caused by the increase and the decrease of the members is reduced.

Drawings

FIG. 1is a schematic flow chart of the algorithm of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides a method for reducing hash jitter caused by LAG member number variation, comprising the following steps:

the first step is real-time monitoring, namely the system monitors the states of the LAG members in real time, and timely discovers the joining, exiting or failure conditions of the members through technologies such as link heartbeat detection, port state polling and the like.

And step two, early warning is started, namely an early warning mechanism is immediately triggered once the quantity change is detected, a notification is sent to a system administrator, a hash adjustment flow is started, and meanwhile, transmission of part of non-key data is suspended, so that a time window is strived for adjustment.

And thirdly, copy initial construction, namely constructing a temporary hash table copy, evaluating the bearing capacity of the new member according to the characteristics (such as bandwidth, delay and the like) of the new member, and primarily distributing the data slot.

When the number of LAG members is monitored to change, the algorithm does not immediately recalculate the hash map of all data, but adopts an incremental adjustment strategy. Firstly, a temporary hash table copy is created, newly added LAG members are allocated to idle slots in the copy, meanwhile, partial data are gradually migrated from an original hash table to the new copy, the migration is based on factors such as data flow priority, recent access frequency and the like, data which have high real-time requirements and are frequently accessed are preferentially migrated, and key data can be ensured to be rapidly adapted to new link layout.

The available bandwidth of the new member is accurately measured by a professional network test tool, the upper limit of the processing capacity is detected by using analog data loading, and the data transmission delay value is obtained by means of a high-precision network probe.

When the initial virtual node quantity is distributed for the LAG member or the virtual node quantity is dynamically regulated according to the member performance, adopting a weighted calculation mode, setting the bandwidth of the member as B, the processing capacity as P, the delay as D and the corresponding weights as respectively、、The calculation formula of the comprehensive score S is as follows:

;

And according to the comprehensive score, combining the current load balancing requirement of the system, distributing a corresponding number of virtual nodes for new members on the consistency hash ring, and selecting a proper insertion position. The insertion position selection follows the load dispersion principle, and the areas with sparse distribution and relatively light load of virtual nodes on the ring are preferentially considered, so that the situation that local load is excessively heavy after new members are fused is avoided.

And fourthly, data migration, namely sorting the data in the original hash table according to a preset priority rule, preferentially migrating the hot spot data and the high-priority service data to the slots of the new members corresponding to the temporary copies in a small-batch and multi-batch mode, and verifying the integrity and consistency of the data after each batch migration, so as to ensure that the data is marked as migrated after no errors.

When data is migrated from the original hash table to the temporary copy, the traffic priority of the data is based(E.g. measured by number of accesses over a period of time, amount of data transferred) and recent access frequencyDetermining migration order (e.g., number of accesses in past hour), prioritizing integrated migration of data item iThe calculation formula is as follows:

;

And creating a temporary hash table copy, and preliminarily planning the loadable data slots according to the performance characteristics of the new members. And carrying out careful sorting on the mass data in the original hash table according to rules such as preset data flow priority, recent access frequency and the like. Hot spot data and high-priority service data which have extremely high real-time requirements and are frequently accessed are preferentially selected, and the hot spot data and the high-priority service data are gradually migrated to the slots of corresponding new members in the temporary copy in a small-batch and multi-batch robust mode. And (3) after each batch of migration is completed, the integrity and consistency of the data are strictly verified by using a data verification algorithm, the data are marked as successfully migrated after being accurate, key data can be ensured to be suitable for a new link layout, and the system operation after the addition of new members is seamlessly connected.

Once a LAG member is detected to leave, the system quickly locks all virtual nodes corresponding to the member on the consistency hash ring. The data carried by these virtual nodes is smoothly and efficiently transitioned over the virtual nodes of other surviving members by a load balancing scheduling algorithm. The load balancing scheduling algorithm can flexibly adopt polling, weighted polling or a dynamic allocation strategy based on real-time load monitoring. For example, if weighted polling is adopted, the weight needs to be recalculated in advance according to the current performance index of the surviving members, so that the data can be ensured to be reasonably distributed according to the residual bearing capacity of each surviving member, uneven data distribution caused by the departure of the members is avoided, and the stable operation of the whole system is maintained.

And step five, node optimization, namely recalculating the distribution of virtual nodes on a consistency hash ring according to new LAG member configuration, reasonably inserting a corresponding number of virtual nodes on the ring according to performance parameters of the newly added members, and carrying out load-balancing migration on partial virtual nodes adjacent to the old members, smoothly transferring data carried by the corresponding virtual nodes of the exiting members to the virtual nodes of other surviving members, and maintaining the stability of the hash ring and the uniformity of data distribution by dynamically adjusting the mapping relation between the virtual nodes and the actual LAG members.

After the data transfer is completed, the system timely cleans virtual nodes left by the members, and optimizes and reconstructs the consistency hash ring. The distribution and load conditions of the remaining members are re-examined, the distribution density of the virtual nodes is dynamically adjusted according to the performance of each surviving member, and the flow direction of data on the ring is further optimized, so that the system can still operate efficiently under the condition that the members are reduced, and the stability and the balance of data transmission are continuously ensured.

When the number of members changes and the load needs to be redistributed, for example, the virtual node load is transferred to other surviving members after the members exit, and the surviving member set is set as M= { by adopting weighted polling or similar strategies,,,Member(s)The current load carrying capacity (determined by the combination of bandwidth, remaining processing resources, etc.) of (a) isThe load to be allocated is L, and the load is allocated to the memberIs of the loading of (2)The calculation formula is as follows:

and step six, switching verification, namely after the data migration in the temporary copy is completed and the reconstruction of the hash ring is stable, switching the new hash table and the old hash table, formally putting the new hash table into use, simultaneously resuming the suspended data transmission, continuously monitoring the transmission performance of the data in a period of time after switching, including indexes such as throughput, delay, packet loss rate and the like, comparing the data before switching, verifying the inhibiting effect of the algorithm on the hash jitter, and if abnormality is found, timely rolling back to the original hash table and re-analyzing and adjusting.

In an embodiment of the invention, when a change in the number of LAG members is detected, either a new member or a member exit:

For the newly added members, the performance indexes of the newly added members are evaluated, and the comprehensive scores are calculated according to the weighting method. Then, according to the current load condition and overall performance distribution of the system, searching a proper position on the consistency hash ring to insert a corresponding number of virtual nodes. The insertion position is selected by taking into consideration the load of each area on the balancing ring as much as possible, so that the local load is avoided from being excessively heavy. For example, if there are more virtual nodes corresponding to high-load members in a certain area on the ring, the virtual nodes of the newly added members are preferentially inserted into the relatively idle area.

For the exiting member, the system rapidly locates all the virtual nodes corresponding to the exiting member, and smoothly transits the data carried by the virtual nodes to the virtual nodes of other surviving members according to the load balancing principle. The load balancing principle can be based on factors such as data flow, current node load and the like, for example, a polling or weighted polling mode is adopted to distribute the virtual node load of the exiting member to other surviving member virtual nodes with residual bearing capacity, so that the data distribution is still uniform.

In an embodiment of the present invention, the system may continuously time (e.g., at (T) intervals) the performance of LAG members during operation, since the actual performance of the members may change with network environment changes, equipment aging, etc. The manner of re-evaluating the metrics and calculating the composite score is the same as in the initial allocation stage.

In the embodiment of the invention, the number of the virtual nodes is adjusted if the performance of the member is found to be improved or reduced to a certain degree according to the new comprehensive score and the current number of the virtual nodes. If the performance is improved, the number of virtual nodes is properly increased on the consistency hash ring, the positions of newly increased virtual nodes are reasonably distributed, the stability of the ring is ensured, if the performance is reduced, the number of the virtual nodes is reduced, and data borne by the reduced virtual nodes are stably migrated to other proper virtual nodes, so that the high efficiency and stability of data transmission are maintained.

Meanwhile, the contents which are not described in detail in the specification belong to the prior art known to the person skilled in the art, and model parameters of each electric appliance are not particularly limited and conventional equipment can be used.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for reducing hash jitter caused by changes in the number of LAG members, comprising the following steps:

Step 1: Real-time monitoring: Use link heartbeat and port polling technologies to closely monitor LAG member status around the clock and accurately capture member joining, exiting, or failure dynamics.

Step 2: Alert activation: When a change in the number of members is detected, an alert is triggered immediately, notifying the administrator, starting the hash adjustment process, and suspending the flow of non-critical data to reserve time for adjustment;

Step 3: Initial replica creation: Evaluate the new member's load capacity based on its bandwidth and latency characteristics, build a temporary hash table replica, and preliminarily plan the data slot layout.

Step 4: Data Migration: Sort the original hash table data according to the preset priority, and prioritize hot and high-priority business data. Move it to the corresponding new member slots of the temporary replica in small batches or multiple batches. Verify the data integrity and consistency of each batch, and mark the migration complete after confirmation.

Step 5. Node optimization: Recalculate the virtual node distribution of the consistent hash ring according to the new LAG member configuration, insert new members into nodes as needed, and migrate some nodes adjacent to old members; when a member exits, smoothly transfer the load of its corresponding node to the surviving member node, and recalculate the distribution of virtual nodes on the consistent hash ring according to the new LAG member configuration. For new members, insert a reasonable number of virtual nodes on the ring based on their performance parameters, and migrate some virtual nodes adjacent to old members in a load-balanced manner; for exiting members, smoothly transfer the data carried by their corresponding virtual nodes to the virtual nodes of other surviving members. By dynamically adjusting the mapping relationship between virtual nodes and actual LAG members, the stability of the hash ring and the uniformity of data distribution are maintained;

Step 6: Switch verification: After the temporary replica data is migrated and the hash ring is stable, enable the new hash table and resume data transmission. Continuously monitor the throughput, latency, and packet loss rate indicators, compare and verify them, and roll back the original hash table and re-analyze and adjust them in case of any anomalies.

2. The method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: in step 1, the system monitors the status of LAG members in real time, and promptly detects the joining, leaving, or failure of members through link heartbeat detection and port status polling technology.

3. A method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: once a number change is detected in step 2, an early warning mechanism is immediately triggered, a notification is sent to the system administrator, and a hash adjustment process is initiated. At the same time, the transmission of some non-critical data is suspended to gain a time window for adjustment.

4. The method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: in step 3, a temporary hash table copy is constructed, the carrying capacity of the new member is evaluated based on its characteristics, and data slots are preliminarily allocated.

5. A method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: in the step 4, the data in the original hash table is sorted according to a pre-set priority rule, and hot data and high-priority business data are preferentially migrated in small batches and multiple batches to the slots of the new members corresponding to the temporary replica. After each batch is migrated, the integrity and consistency of the data are verified, and after ensuring that there are no errors, it is marked as migrated.

6. A method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: in step 6, after the data migration in the temporary copy is completed and the hash ring reconstruction is stable, the old and new hash tables are switched, the new hash table is officially put into use, and the suspended data transmission is resumed at the same time. For a period of time after the switch, the data transmission performance is continuously monitored, including indicators of throughput, delay and packet loss rate, and compared with the data before the switch to verify the algorithm's suppression effect on hash jitter. If an abnormality is found, it is rolled back to the original hash table in time and re-analyzed and adjusted.

7. The method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, wherein: in step 4, when migrating data from the original hash table to the temporary copy, the data is prioritized according to its traffic flow. and recent visit frequency To determine the migration order, let the comprehensive migration priority of data item i be The calculation formula is:

;

Where, and It is a pre-set weight coefficient used to balance the importance of traffic priority and recent access frequency;

according to Sort the data in descending order by size and migrate them first Data items with high values.

8. A method for reducing hash jitter caused by changes in the number of LAG members according to claim 1, characterized in that: in step 5, when the number of members changes and load redistribution is required, after a member exits, its virtual node load is transferred to other surviving members, using weighted round-robin, assuming that the set of surviving members is M={ , , , },member The current load carrying capacity is , the load to be distributed is L, distributed to members Load The calculation formula is as follows: