[lvs-users] Problem with Least-connection scheduling and MySql

Christian Frost chrfrost at cs.aau.dk
Tue May 5 15:12:10 BST 2009


Simon Horman wrote:
> On Mon, May 04, 2009 at 10:31:59AM +0200, Christian Frost wrote:
>   
>> Hi,
>> We have a setup including two real servers each of which runs an 
>> instance of MySql with the max_connections option set to 1000. In this 
>> setup we have run some performance tests with mysqlslap two determine 
>> the throughput of the setup. These tests involve simulating many 
>> simultaneous users querying the database. Under these conditions we have 
>> encountered some problems with the load balancer. Specifically, using 
>> ipvsadm -L -n to monitor the connections during the performance test 
>> there are intitially many connections represented as inactive. After a 
>> few seconds the inactive connections are represented as active in the 
>> respective real server. This causes a problem when the Least-Connection 
>> Scheduling algorithm is used because the connections are not equally 
>> between the two real hosts. The two real hosts are almost equal in terms 
>> of processing capacities.
>>
>> In the following the output of ipvsadm -L -n is shown which probably 
>> explains the problem better.
>>
>> ipvsadm -L -n a few seconds in the test simulating 200 MySql clients 
>> connecting simultaneously.
>>
>> IP Virtual Server version 1.2.1 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>> TCP  10.0.1.5:3306 lc
>>   -> 10.0.1.2:3306                Route   1      71         0
>>   -> 10.0.1.4:3306                Route   1      70         60
>>
>>
>> ipvsadm -L -n after 30 seconds in the test simulating 200 MySql clients 
>> connecting simultaneously. Note that the load balancer uses the 
>> Least-Connection scheduling algorithm.
>>
>> IP Virtual Server version 1.2.1 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>> TCP  10.0.1.5:3306 lc
>>   -> 10.0.1.2:3306                Route   1      71         0
>>   -> 10.0.1.4:3306                Route   1      130        0
>>
>>
>> The problem does not occur if the connections are made sequentially and 
>> if the number of total connections is below about 100.
>>
>> Is there anything we can do to avoid these problems?
>>     
>
> Hi Christian,
>
> I'm taking a bit of a stab in the dark, but I think that the problem that
> you are seeing is with the lc (and wlc) algorithms interraction with burst
> of connections.
>
> I think that the core of the problem is the way that lc calculates the
> overhead of a server. This being relevant as an incomming connection is
> allocated to whichever real-server is deemed to have the lowest overhead
> at that time.
>
> In net/netfilter/ipvs/ip_vs_lc.c:ip_vs_lc_dest_overhead()
> overhead is calculated as:
>
> 	active_connections * 256 + inactive_connections
>
> So suppose that things are in a more or less balanced state,
> real-server A has 71 connections and real-server B has 70.
>
> Then a big burst of 60 new connections comes in.  The first of these new
> connections will go to real-server B, as expected. This connection will be
> in the inactive state until the 3 way handshake is complete. So far so good.
>
> Unfortunately, if the other 59 new connections come in before any of the
> other new connections complete the handshake and move into the active
> state, they will all be allocated to real-server B because:
>
>     71 * 256 + 0 > 70 * 256 + n
>     where: n < 256
>
> Assuming that I am correct I can think of two methods of addressing this
> problem:
>
> 1) Simply change 256 to a smaller value. In this case 256 basically
>    ends up being the granularity of balancing for bursts of connections.
>    And in the case at hand, clearly 256 is too coarse. Perhaps 8, 2 or
>    even 1 would be a better value.
>
>    This should be a trivial change to the code, and if lc is a module
>    you wouldn't even need to recompile the entire kernel - though you
>    would need to track down the original kernel source and config.
>
>    The main drawback of this is that if you have a lot of old, actually
>    dead, connections in the inactive state, then it might cause imbalance.
>
>    If that does help it might be good to consider making this parameter
>    configurable at run time, at least globally.
>
> 2) A more complex though arguably better approach would be to implement
>    some kind of slow start feature. That is, to assign some kind of weight
>    to new connections. I had a stab at this one in the past - it should
>    be in the archives - though I think my solution only addressed the
>    problem for active connections. But the idea seems reasonable
>    to extend to this problem.
>
>   
Hi,

We tried method 1, which turned out to balance the connections 
perfectly. We multiplied with 1.

Thank you.

/Christian




More information about the lvs-users mailing list