invoke scheduler for every received packet

Roberto Nibali ratz at drugphish.ch
Thu Jul 13 08:28:41 BST 2006


Hi guys,

>> I have just briefly skimmed over this patch and am a bit unsure how
>> efficient it is, but it seems to not populate the template cache.  Why
>> invoking the scheduler module for every packet is special, I fail to
>> see. Having non-persistent scheduling to does the same. But I've only
>> looked at the patch for 2 minutes.
> 
> Isn't the difference that persistance acts on connections,
> where as this patch acts on packets? Of course, UDP ones,
> TCP would make no sense.

Yes, I reckon I must have been mislead by the wording used in "schedule" 
which I was referring to the kernel scheduler. What this patch does is 
(and we leave TCP out, since it does not make sense there):

srcIP:srcPort --> dstIP:dstPort --> call scheduler code (RR, LC, ...)
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing

Whereas the non-OPS patched kernel would do:

srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing

Is this more or less accurate? Also, the "do nothing" part is to my 
avail independent of LVS_DR or LVS_NAT, since UDP has no state 
transition to be handled.

> I seem to recall needing a similar feature myself at some stage.
> The usage that I had was related to SIP, where all packets seem to
> come and go to and from the same ports and addresses, but can actually
> be loadbalanced independantly.
> 
> I am quite happy to push this to DaveM for inclusion. After some reveiw
> of course. I will start with a review myself, the patch is taken from
> the archive.linuxvirtualserver.org URL above.

[How can one make email stuff visible after the '-- '-tag?]

 > diff -ruN linux-2.6.13.orig/include/net/ip_vs.h 
linux-2.6.13/include/net/ip_vs.h
 > > --- linux-2.6.13.orig/include/net/ip_vs.h	2005-09-28 
00:49:21.000000000 +0200
 > > +++ linux-2.6.13/include/net/ip_vs.h	2005-09-28 14:39:32.000000000 
+0200
 > > @@ -19,6 +19,7 @@
 > >   */
 > >  #define IP_VS_SVC_F_PERSISTENT	0x0001		/* persistent port */
 > >  #define IP_VS_SVC_F_HASHED	0x0002		/* hashed entry */
 > > +#define IP_VS_SVC_F_ONEPACKET	0x0004		/* one-packet scheduling */

Maybe "EACHPACKET"? Since the scheduler is invoked for each packet and 
not for one packet :). The same of course applies to the whole patch.

 > >
 > >  /*
 > >   *      Destination Server Flags
 > > @@ -84,6 +85,7 @@
 > >  #define IP_VS_CONN_F_IN_SEQ	0x0400		/* must do input seq adjust */
 > >  #define IP_VS_CONN_F_SEQ_MASK	0x0600		/* in/out sequence mask */
 > >  #define IP_VS_CONN_F_NO_CPORT	0x0800		/* no client port set yet */
 > > +#define IP_VS_CONN_F_ONE_PACKET	0x1000		/* forward only one packet */
 > >
 > >  /* Move it to better place one day, for now keep it unique */
 > >  #define NFC_IPVS_PROPERTY	0x10000
 > > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c 
linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c
 > > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c	2005-09-28 
00:49:23.000000000 +0200
 > > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c	2005-09-28 
02:54:55.000000000 +0200
 > > @@ -127,6 +127,9 @@
 > >  	unsigned hash;
 > >  	int ret;
 > >
 > > +	if (cp->flags & IP_VS_CONN_F_ONE_PACKET)
 > > +		return 0;
 > > +
 > >  	/* Hash by protocol, client address and port */
 > >  	hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
 > >
 > > @@ -275,6 +278,11 @@
 > >   */
 > >  void ip_vs_conn_put(struct ip_vs_conn *cp)
 > >  {
 > > +	if (cp->flags & IP_VS_CONN_F_ONE_PACKET) {
 > > +		ip_vs_conn_expire_now(cp);
 > > +		return;
 > > +	}
 > > +

Is it really needed to have a timer in case OPS is active? Or would it 
be even more effort to comment the timer code out for OPS-flagged 
packets, than just leaving it in.

 > >  	/* reset it expire in its timeout */
 > >  	mod_timer(&cp->timer, jiffies+cp->timeout);
 > >
 > > @@ -506,7 +514,7 @@
 > >  	/*
 > >  	 *	unhash it if it is hashed in the conn table
 > >  	 */
 > > -	if (!ip_vs_conn_unhash(cp))
 > > +	if (!ip_vs_conn_unhash(cp) && !(cp->flags & IP_VS_CONN_F_ONE_PACKET))
 > >  		goto expire_later;
 > >
 > >  	/*
 > > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c 
linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c
 > > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c	2005-09-28 
00:49:23.000000000 +0200
 > > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c	2005-09-28 
02:54:55.000000000 +0200
 > > @@ -215,6 +215,7 @@
 > >  	struct ip_vs_dest *dest;
 > >  	struct ip_vs_conn *ct;
 > >  	__u16  dport;	 /* destination port to forward */
 > > +	__u16 flags;

Since I'm working on Windows only right now for a customer, I have no 
easy access to kernel repositories. However, from looking at that 
snipped, I have two comments:

1.) Potential user space breakage?
2.) Namespace: We don't need another variable just being named "flags".
     In the past I've cleaned up some of this for my own tree because it
     was simple too cumbersome to find out to which struct a sprinkled
     flags variable in the code would belong.
     Make it ops_flags or cp_flags or whichever context it belongs to.

 > >  	__u32  snet;	 /* source network of the client, after masking */
 > >
 > >  	/* Mask saddr with the netmask to adjust template granularity */
 > > @@ -345,6 +346,9 @@
 > >  		dport = ports[1];
 > >  	}
 > >
 > > +	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
 > > +		 && iph->protocol == IPPROTO_UDP)?
 > > +		IP_VS_CONN_F_ONE_PACKET : 0;

 > Could this be changed to an if then construct?

Agreed.

 > > +	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
 > > +		 && iph->protocol == IPPROTO_UDP)?
 > > +		IP_VS_CONN_F_ONE_PACKET : 0;

 > Again, if then, please.

Again, agreed :).

 > >  	/*
 > >  	 *    Create a connection entry.
 > >  	 */
 > > @@ -419,7 +427,7 @@
 > >  			    iph->saddr, pptr[0],
 > >  			    iph->daddr, pptr[1],
 > >  			    dest->addr, dest->port?dest->port:pptr[1],
 > > -			    0,
 > > +			    flags,
 > >  			    dest);
 > >  	if (cp == NULL)
 > >  		return NULL;
 > > @@ -462,6 +470,9 @@
 > >  	    && (inet_addr_type(iph->daddr) == RTN_UNICAST)) {
 > >  		int ret, cs;
 > >  		struct ip_vs_conn *cp;
 > > +		__u16 flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
 > > +				iph->protocol == IPPROTO_UDP)?
 > > +				IP_VS_CONN_F_ONE_PACKET : 0;

Since this construct is in use at least 3 times already, how about a 
macro (ip_vs.h)?

 > > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_ctl.c	2005-09-28 
14:49:20.000000000 +0200
 > > @@ -1753,14 +1753,18 @@
 > >  		const struct ip_vs_dest *dest;
 > >
 > >  		if (iter->table == ip_vs_svc_table)
 > > -			seq_printf(seq, "%s  %08X:%04X %s ",
 > > +			seq_printf(seq, "%s  %08X:%04X %s%s ",
 > >  				   ip_vs_proto_name(svc->protocol),
 > >  				   ntohl(svc->addr),
 > >  				   ntohs(svc->port),
 > > -				   svc->scheduler->name);
 > > +				   svc->scheduler->name,
 > > +				   (svc->flags & IP_VS_SVC_F_ONEPACKET)?
 > > +				   " ops":"");
 > >  		else
 > > -			seq_printf(seq, "FWM  %08X %s ",
 > > -				   svc->fwmark, svc->scheduler->name);
 > > +			seq_printf(seq, "FWM  %08X %s%s ",
 > > +				   svc->fwmark, svc->scheduler->name,
 > > +				   (svc->flags & IP_VS_SVC_F_ONEPACKET)?
 > > +				   " ops":"");

 > I'm not entirely comforatle with this proc change. Won't it break
 > user-space compatiblility?

I'm not sure if it will break or simply ignore it. I believe that if you 
cat /proc/... OPS will be shown, however ipvsadm needs a patch to show 
it. This should be around anyway to honour setting the ONEPACKET flag to 
a service.

The other problem I see is that it heavily interferes with my work done 
on the session overflow code. I've also already enhanced that line 
heavily with more information. It's a pity that the ABI was never 
frozen. Ipvsadm actually should query all this information from the 
kernel, instead of parsing proc-fs output. And this preferably via 
netlink :).

Best regards,
Roberto Nibali, ratz
-- 
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list