[lvs-users] Port mapping with LVS-DR using fwmark

Jacoby Hickerson hickersonjl at gmail.com
Wed Jan 29 00:41:09 GMT 2014


Thanks Julian this helps me understand it a lot better.  Are you suggesting
using masquerading method? That isn't an ideal option for me unless of
course it is the only option.

To see how much further I could get using DR, I removed the redirect and
added the following to both real servers:
iptables -t nat -A PREROUTING -p tcp -m tcp --destination 172.17.0.24
--dport 80 -j DNAT --to-destination 172.17.0.24:50000

After the DNAT update it now sends packets to the real server 2, however
the port is not what the client expects.

The problem is that the real server 2 receives packets on the port mapped
port 50000 instead of port 80.
Here is debug output when it connects to real server 2:

Jan 28 23:58:57 pc01 kernel: IPVS: lookup service: fwm 100 TCP
172.17.0.24:50000 hit
Jan 28 23:58:57 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 28 23:58:57 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
refcnt 1 weight 100
Jan 28 23:58:57 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38193 v:
172.17.0.24:50000 d:172.17.0.17:50000 fwd:R s:4 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38193 v:
172.17.0.24:50000 d:172.17.0.17:50000 conn->flags:101C3 conn->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: TCP input  [S...] 172.17.0.17:50000->
172.17.0.2:38193 state: NONE->SYN_RECV conn->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:41024->
172.17.0.16:22 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:41024->
172.17.0.16:22 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38193->
172.17.0.24:50000 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38193->
172.17.0.24:50000 hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116

So we see above that the virtual address is 172.17.0.24:50000 ideally that
would be port 80.  Or destination address 172.17.0.17 of the RIP2 should be
port 80.

The following is the tcpdump on real server 2 showing that it is
transmitting to the client with the unexpected port mapping of 50000 (so
the connect hangs):
tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just
bond0)
23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq
1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr
0,nop,wscale 7], length 0
23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq
1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr
0,nop,wscale 7], length 0
23:58:59.446484 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq
59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val
353113117 ecr 447300324,nop,wscale 7], length 0
23:58:59.446487 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq
59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val
353113117 ecr 447300324,nop,wscale 7], length 0
23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq
1458168691, win 0, length 0
23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq
1458168691, win 0, length 0

Here is debug output when it connects to real server 1:
Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 100 TCP
172.17.0.24:50000 hit
Jan 28 23:58:47 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 28 23:58:47 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
refcnt 1 weight 100
Jan 28 23:58:47 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38192 v:
172.17.0.24:50000 d:172.17.0.16:50000 fwd:R s:65276 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38192 v:
172.17.0.24:50000 d:172.17.0.16:50000 conn->flags:101C3 conn->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: TCP input  [S...] 172.17.0.16:50000->
172.17.0.2:38192 state: NONE->SYN_RECV conn->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50000->
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50000->
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 0 TCP
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38192->
172.17.0.24:50000 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38192->
172.17.0.24:50000 hit
Jan 28 23:58:47 pc01 kernel: IPVS: TCP input  [..A.] 172.17.0.16:50000->
172.17.0.2:38192 state: SYN_RECV->ESTABLISHED conn->refcnt:2

The output of tcpdump shows that the connection is good on real server 1 ->
client:
tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just
bond0)
23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq
2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr
0,nop,wscale 7], length 0
23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq
2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr
0,nop,wscale 7], length 0
23:58:47.241128 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq
709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val
353091780 ecr 447290123,nop,wscale 7], length 0
23:58:47.241131 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq
709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val
353091780 ecr 447290123,nop,wscale 7], length 0
23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 447290123 ecr 353091780], length 0
23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 447290123 ecr 353091780], length 0
23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780],
length 173
23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780],
length 173
23:58:47.241443 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174,
win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241446 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174,
win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241569 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1,
ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241573 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1,
ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241944 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175,
win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0
23:58:47.241946 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175,
win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0

Thanks again for spending time debugging this.

Jacoby


On Tue, Jan 28, 2014 at 1:16 AM, Julian Anastasov <ja at ssi.bg> wrote:

>
>         Hello,
>
> On Mon, 27 Jan 2014, Jacoby Hickerson wrote:
>
> > Certainly and that makes sense, I will consolidate what I've emailed
> before
> > with the additional information here.
> >
> > # PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for
> the
> > client box.
> >
> > There are 3 boxes total, client box, director/RIP1( real server 1) and
> RIP2
> > (real server 2):
> > - client box:
> > inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1   #CIP
> >
> > - director which is the same as real server 1 (RIP1).  The client is on a
> > separate box.
> > inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0
> > #RIP1
> > inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2
> #VIP
> >
> > - real server 2 (RIP2)
> > inet 172.17.0.24/32 scope global lo:0                      #VIP on
> loopback
> > inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0  #RIP2
> >
> > # ipvs setup on real server 1 (RIP1) only
> > ipvsadm -C
> > ipvsadm -A -f 100 -s rr
> > ipvsadm -a -f 100 -r 172.17.0.16 -w 100
> > ipvsadm -a -f 100 -r 172.17.0.17 -w 100
> >
> > # iptable rules (these rules are set for both real server 1 and real
> server
> > 2)
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> >
> > The test I'm conducting is an http get from the client box connecting to
> the
> > VIP:
> > - Issue the following command on the client box:
> > curl -v 'http://172.17.0.24'
> >
> > On both real servers there is an nginx webserver listening on port 50000
> >
> > I also turned on debugging and ran the curl command with port mapping
> using
> > level 12 debug (this is output when the issue occurs of no load
> balancing).
> > Debug output on real server 1 after executing the curl command the first
> > time:
> >
> > Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns
> 0
> > refcnt 1 weight 100
>
>         The debug output was very helpful.
>
>         Looks like -j REDIRECT combined with DR is a bad idea.
> When packet comes to IPVS the daddr is already 172.17.0.16,
> see the "v:172.17.0.16" line below:
>
> > Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455
> > v:172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
>
>         The remote real server 2 is not configured for
> such VIP (172.17.0.16). I don't remember when was
> -j REDIRECT used for IPVS setups, may be for transparent
> proxy setups.
>
>         Why not just use NAT method for both servers
> without any REDIRECT rules?
>
>         Even -j DNAT --to-destination VIP:50000 has better
> chance to use VIP instead of first IP.
>
> > Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455
> > v:172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 24 23:05:44 pc01 kernel: IPVS: TCP input  [S...]
> > 172.17.0.17:50130->172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
> > Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1031
>
>         Above "ip_vs_xmit.c line 1031" means packet was
> sent to remote real server 2 (172.17.0.17) but due to
> -j REDIRECT the daddr is 172.17.0.16.
>
> ...
>
> > Debug output on real server 1 after executing the curl command a second
> > time:
> >
> > Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> > Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns
> 0
> > refcnt 1 weight 100
> > Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456
> > v:172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456
> > v:172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: TCP input  [S...]
> > 172.17.0.16:50130->172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
>
>         No "ip_vs_xmit.c line 1031" here, packet was delivered
> locally with NF_ACCEPT, so it goes to local real server
> as per the "d:172.17.0.16" info.
>
> > Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP
> > 172.17.0.16:50130->172.17.0.2:37456 hit
>
> ...
>
> > Below is an example of good results when connecting directly to port
> 50000.
>
>         So, no -j REDIRECT => no problem?
>
> >  For this scenario I removed port 80 and updated iptables with fwmark for
> > port 50000:
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 50000 -j MARK --set-xmark 0x64/0xffffffff
> >
> > Debug output on real server 1 when not port mapping first test (curl -v
> > 'http://172.17.0.24:50000'):
> >
> > Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> > Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns
> 0
> > refcnt 1 weight 100
> > Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815
> > v:172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
>
>         Yep, "v:172.17.0.24" means no -j REDIRECT was used.
>
> > Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815
> > v:172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 25 00:19:37 pc01 kernel: IPVS: TCP input  [S...]
> > 172.17.0.17:50130->172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16,
> > refcnt=1
> > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
> > Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1031
>
> Regards
>
> --
> Julian Anastasov <ja at ssi.bg>
>


More information about the lvs-users mailing list