You can subscribe to this list here.
| 2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(3) |
Jul
|
Aug
(7) |
Sep
|
Oct
(2) |
Nov
(1) |
Dec
(7) |
| 2006 |
Jan
(1) |
Feb
(2) |
Mar
(3) |
Apr
(3) |
May
(5) |
Jun
(1) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
(17) |
Nov
(18) |
Dec
(1) |
| 2007 |
Jan
|
Feb
|
Mar
(8) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(6) |
Dec
(1) |
| 2008 |
Jan
(17) |
Feb
(20) |
Mar
(8) |
Apr
(8) |
May
(10) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(9) |
Oct
(19) |
Nov
(4) |
Dec
(35) |
| 2009 |
Jan
(40) |
Feb
(16) |
Mar
(7) |
Apr
(6) |
May
|
Jun
(5) |
Jul
(5) |
Aug
(4) |
Sep
(1) |
Oct
(2) |
Nov
(15) |
Dec
(15) |
| 2010 |
Jan
(5) |
Feb
(20) |
Mar
(12) |
Apr
|
May
(2) |
Jun
(4) |
Jul
|
Aug
(11) |
Sep
(1) |
Oct
(1) |
Nov
(3) |
Dec
|
| 2011 |
Jan
(8) |
Feb
(19) |
Mar
|
Apr
(12) |
May
(7) |
Jun
(8) |
Jul
|
Aug
(1) |
Sep
(21) |
Oct
(7) |
Nov
(4) |
Dec
|
| 2012 |
Jan
(3) |
Feb
(25) |
Mar
(8) |
Apr
(10) |
May
|
Jun
(14) |
Jul
(5) |
Aug
(12) |
Sep
(3) |
Oct
(14) |
Nov
|
Dec
|
| 2013 |
Jan
(10) |
Feb
(4) |
Mar
(10) |
Apr
(14) |
May
(6) |
Jun
(13) |
Jul
(37) |
Aug
(20) |
Sep
(11) |
Oct
(1) |
Nov
(34) |
Dec
|
| 2014 |
Jan
(8) |
Feb
(26) |
Mar
(24) |
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
(28) |
Oct
(4) |
Nov
(4) |
Dec
(2) |
| 2015 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(13) |
Jul
|
Aug
(3) |
Sep
(8) |
Oct
(11) |
Nov
(16) |
Dec
|
| 2016 |
Jan
|
Feb
(6) |
Mar
|
Apr
(9) |
May
(23) |
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
(7) |
May
(3) |
Jun
|
Jul
(3) |
Aug
|
Sep
(8) |
Oct
|
Nov
|
Dec
(3) |
| 2018 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
(4) |
Feb
|
Mar
(2) |
Apr
(6) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2020 |
Jan
|
Feb
|
Mar
|
Apr
(31) |
May
|
Jun
|
Jul
|
Aug
(7) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2021 |
Jan
(2) |
Feb
(2) |
Mar
(5) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
|
|
2
|
3
(2) |
4
(4) |
5
(6) |
6
(3) |
7
|
8
|
|
9
(1) |
10
(4) |
11
|
12
|
13
|
14
|
15
|
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
|
23
|
24
|
25
|
26
(5) |
27
(1) |
28
|
|
|
From: nicolas b. <sl1...@gm...> - 2014-02-27 09:43:13
|
hello,
first, here's how I initialize the address where I need to write:
static void new_osc_sender(struct common_Nodes* recordedNodes) {
oscWriter_t* this_W = NULL;
struct common_Nodes *node, *tmp;
node = tmp = NULL;
char* buff = NULL;
if(NULL != recordedNodes)
{
if(NULL == (buff = GenAlloc(ALLOCATION_BLOCK_SIZE))) {
Indic_Error(UNEXPECTED, 0);
return;
}
HASH_ITER(hh, recordedNodes, node, tmp) {
if(NULL != (this_W = dl_SysAlloc(sizeof(*this_W))))
{
sprintf(buff, "%d", node->port);
this_W->addr = lo_address_new(node->ipData , buff);
this_W->addr_str = dupstr(node->ipData);
this_W->port_str = dupstr(buff);
this_W->isUsed = node->idt;
//printf("add %s %s\n", this_W->addr_str, this_W->port_str);
LL_APPEND(writerServer,this_W);
}
}
FRI(buff);
}
return;
}
so it's simply a lo_address_new() and I reuse the address from this_W->addr.
2014-02-26 13:38 GMT+01:00 Stephen Sinclair <rad...@gm...>:
> On Wed, Feb 26, 2014 at 11:50 AM, nicolas bats <sl1...@gm...>
> wrote:
> > Hi,
> > with the latest release (but I don't think it's 0.28 related) I can see
> some
> > behaviors on win that are not particulary desired...
> >
> > let me explain:
> > my app create a server with lo_server_new(), so far so good, and I use
> only
> > UDP.
> > the wifi is off and the firewalls are off.
> > the network card address is 192.168.1.30
> >
> > if I send OSC message to address like 10.x.x.x, as the network can't be
> > reached I don't see any lag, but, if I send OSC message to 192.168.1.x
> and
> > there's no receiver, I see a huge lag, like if there's a timeout or
> > something like this.
> > did you guys noticed the same behavior?
> > any leads in order to find wich call is reponsible of lag?
>
> It would be good to know if it's a write() call for example. Also,
> does the lag occur only the first time, or every time?
within my app, it happens all the time
> I wonder if
> it's more of a name resolution bug, could be something regarding
> initializing a lo_address.
>
> What is "huge lag" exactly? A few ms, a few seconds, a minute or more?
>
let's say a few seconds, but more than your log shows
>
> Possibly stepping through the code in gdb starting at lo_send() might help.
>
> Here on my Windows 7 machine I do see a short lag of a couple of
> seconds when sending to an unknown address using oscsend.exe, but it
> seems to go away the second and third times I run it. If I change the
> last number of the IP address I get the lag again, only once.
even once it's a shame...
can't we get rid of that?
> Here is
> a dump:
>
> (This is compiled with MingW, and one of my NICs is configured to
> 192.168.56.1 / 255.255.255.0)
>
> -------------------------------
> bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test
>
> real 0m2.644s
> user 0m0.000s
> sys 0m0.015s
> bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test
>
> real 0m0.037s
> user 0m0.015s
> sys 0m0.000s
> bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test
>
> real 0m0.037s
> user 0m0.015s
> sys 0m0.000s
> bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test
>
> real 0m2.596s
> user 0m0.000s
> sys 0m0.015s
> bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test
>
> real 0m0.038s
> user 0m0.000s
> sys 0m0.015s
> bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test
>
> real 0m0.040s
> user 0m0.000s
> sys 0m0.015s
> -------------------------------
>
for the same address, mine shows 0m2.928s (more or less) every time.
the same as with new address
++
NIcolas
>
>
> Steve
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> liblo-devel mailing list
> lib...@li...
> https://lists.sourceforge.net/lists/listinfo/liblo-devel
>
|
|
From: nicolas b. <sl1...@gm...> - 2014-02-26 12:59:11
|
Hi, Thanx for your answers..I'm in theater right now for daily work so I could not investigate more. I'll try the oscsend.exe and tell you the results. I'll have a look also to the lo.address. Thanx guys, Nicolas Le 26 févr. 2014 13:48, "Stephen Sinclair" <rad...@gm...> a écrit : > On Wed, Feb 26, 2014 at 1:40 PM, IOhannes m zmoelnig <zmo...@ie...> > wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA256 > > > > On 2014-02-26 13:38, Stephen Sinclair wrote: > >> I do see a short lag of a couple of seconds when sending to an > >> unknown address using oscsend.exe, but it seems to go away the > >> second and third times I run it. If I change the last number of > >> the IP address I get the lag again, only once. > > > > this sounds suspiciously like a DNS problem. (i'm aware that we are > > talking about IP addresses here) > > Agreed. I think lo_address_resolve() gets called regardless of whether > the host is specified as an dotted quad or as a hostname. This causes > getaddrinfo() to be called. Not sure why this might cause a DNS > lookup. Maybe getnameinfo() is being triggered somewhere. Sure wish > Windows had a strace() equivalent ;) > > Steve > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > liblo-devel mailing list > lib...@li... > https://lists.sourceforge.net/lists/listinfo/liblo-devel > |
|
From: Stephen S. <rad...@gm...> - 2014-02-26 12:48:07
|
On Wed, Feb 26, 2014 at 1:40 PM, IOhannes m zmoelnig <zmo...@ie...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > On 2014-02-26 13:38, Stephen Sinclair wrote: >> I do see a short lag of a couple of seconds when sending to an >> unknown address using oscsend.exe, but it seems to go away the >> second and third times I run it. If I change the last number of >> the IP address I get the lag again, only once. > > this sounds suspiciously like a DNS problem. (i'm aware that we are > talking about IP addresses here) Agreed. I think lo_address_resolve() gets called regardless of whether the host is specified as an dotted quad or as a hostname. This causes getaddrinfo() to be called. Not sure why this might cause a DNS lookup. Maybe getnameinfo() is being triggered somewhere. Sure wish Windows had a strace() equivalent ;) Steve |
|
From: IOhannes m z. <zmo...@ie...> - 2014-02-26 12:41:22
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2014-02-26 13:38, Stephen Sinclair wrote: > I do see a short lag of a couple of seconds when sending to an > unknown address using oscsend.exe, but it seems to go away the > second and third times I run it. If I change the last number of > the IP address I get the lag again, only once. this sounds suspiciously like a DNS problem. (i'm aware that we are talking about IP addresses here) gfdmart IOhannes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iQIcBAEBCAAGBQJTDeDSAAoJELZQGcR/ejb4On4P/0CsYST1QP6+/ganoSKYpq9/ 5B5kTVXI205KHq0B7X+RciaMJb2zETwvpE+JaAbzyYbY2rTw146t6m9LVJuJd7dr U5lUZ/tnzwjadvHg5bwIj6A/k68+/7vf6SPxWeGXZeE+5qk2nFeoRfM98Hv+kJ03 DPrrfMni5Fp2oJBIoU+F/ODnVbw6ec3Fdq093D6/KonO77MFnFaRT2vfvyJomNT+ V7ljWvtx/DpzcQCRbCVtlVa9VIm5PeK54jrSm3oHSwgnnXi7cnPWvquTwaAAu0pc pqQtmGZWUJzlElAUxtbs0b9rg17df+ghQgQBZAlcjmw2McadT8bQbF8ccnla8vZe Ux/4Co315Lh9e33oVsmAm2jJtWWTPSqd3h+x/twEoeo45dDQsyQ81KW+SbszR6eQ RgiYMx5fAVaQKIeH3faCzx9CspcRKlYBGmPLzkqPA3aJ2Z1SqZ0npMXtMhA4Yide uSx/tVsfoRGrHvri1JsdcuT1AS85b2F2CfIr71N0UNOKg4n8d7KtsWzHBNE7EgDa mAUZt5u7S5G7yyTS2PcgAQ/fkZCX4e/Twi6GcsFXhGiuHUIh04VLgHPiPiZ7bIVB fn3Xr0XrcsFvhTTL0gtfTgsSX6Hi0/mmuRgCFdRhHQLpIe6IJ5dgjz0SJSuRv38a wwbLiLRjj52+AUB2QW/0 =oicg -----END PGP SIGNATURE----- |
|
From: Stephen S. <rad...@gm...> - 2014-02-26 12:38:42
|
On Wed, Feb 26, 2014 at 11:50 AM, nicolas bats <sl1...@gm...> wrote: > Hi, > with the latest release (but I don't think it's 0.28 related) I can see some > behaviors on win that are not particulary desired... > > let me explain: > my app create a server with lo_server_new(), so far so good, and I use only > UDP. > the wifi is off and the firewalls are off. > the network card address is 192.168.1.30 > > if I send OSC message to address like 10.x.x.x, as the network can't be > reached I don't see any lag, but, if I send OSC message to 192.168.1.x and > there's no receiver, I see a huge lag, like if there's a timeout or > something like this. > did you guys noticed the same behavior? > any leads in order to find wich call is reponsible of lag? It would be good to know if it's a write() call for example. Also, does the lag occur only the first time, or every time? I wonder if it's more of a name resolution bug, could be something regarding initializing a lo_address. What is "huge lag" exactly? A few ms, a few seconds, a minute or more? Possibly stepping through the code in gdb starting at lo_send() might help. Here on my Windows 7 machine I do see a short lag of a couple of seconds when sending to an unknown address using oscsend.exe, but it seems to go away the second and third times I run it. If I change the last number of the IP address I get the lag again, only once. Here is a dump: (This is compiled with MingW, and one of my NICs is configured to 192.168.56.1 / 255.255.255.0) ------------------------------- bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test real 0m2.644s user 0m0.000s sys 0m0.015s bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test real 0m0.037s user 0m0.015s sys 0m0.000s bash-3.1$ time ./oscsend.exe 192.168.56.20 9000 /test real 0m0.037s user 0m0.015s sys 0m0.000s bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test real 0m2.596s user 0m0.000s sys 0m0.015s bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test real 0m0.038s user 0m0.000s sys 0m0.015s bash-3.1$ time ./oscsend.exe 192.168.56.21 9000 /test real 0m0.040s user 0m0.000s sys 0m0.015s ------------------------------- Steve |
|
From: nicolas b. <sl1...@gm...> - 2014-02-26 10:50:45
|
Hi, with the latest release (but I don't think it's 0.28 related) I can see some behaviors on win that are not particulary desired... let me explain: my app create a server with lo_server_new(), so far so good, and I use only UDP. the wifi is off and the firewalls are off. the network card address is 192.168.1.30 if I send OSC message to address like 10.x.x.x, as the network can't be reached I don't see any lag, but, if I send OSC message to 192.168.1.x and there's no receiver, I see a huge lag, like if there's a timeout or something like this. did you guys noticed the same behavior? any leads in order to find wich call is reponsible of lag? best regards, nicolas |
|
From: Stephen S. <rad...@gm...> - 2014-02-10 23:13:15
|
On Mon, Feb 10, 2014 at 11:55 PM, Felipe Sateler <fsa...@gm...> wrote: > It appears I have not been clear. The testsuite was not passing correctly on > 0.26, because it was not being run. > > I enabled it in 0.28 after noticing the testsuite did not pass in sparc, to > prevent building liblo in such cases. Ah, I see! Thanks for the explanation re 0.26, yes I failed to understand that ;-) > I suppose the REUSEPORT issue would be problematic only for people that > compile with new glibc but run on an old kernel. Not sure how many of those > are there. And it would only be an issue for multicast users which are even fewer, I think. With the fix I committed I hope it is not an issue anyways. Let's go with this then, thanks for all your work! Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-10 22:55:17
|
It appears I have not been clear. The testsuite was not passing correctly on 0.26, because it was not being run. I enabled it in 0.28 after noticing the testsuite did not pass in sparc, to prevent building liblo in such cases. I suppose the REUSEPORT issue would be problematic only for people that compile with new glibc but run on an old kernel. Not sure how many of those are there. On Feb 10, 2014 7:31 PM, "Stephen Sinclair" <rad...@gm...> wrote: > On Mon, Feb 10, 2014 at 5:56 PM, Felipe Sateler <fsa...@gm...> > wrote: > > On Sun, Feb 9, 2014 at 12:44 PM, Stephen Sinclair <rad...@gm...> > wrote: > >> Hello, > >> > >> I added a couple of patches to ignore errors from SO_REUSEPORT, and to > >> refactor testlo, and allow disabling of network-based tests using a > >> configure flag, --disable-network-tests > >> > >> I added a bunch of other flags while I was at it. > >> (--disable-examples, --disable-tools, etc) > >> > >> While working on testlo I did come across a couple of places that > >> might have stalled forever if messages got blocked by a firewall, so I > >> added timeouts to these places. > > > > Excellent. I picked the patches and uploaded disabling the network > > tests. Now liblo is building everywhere except sparc (intentional) and > > ia64 (no longer a release architecture, has old gcc) so far. > > Great news. Yeah I guess since the changes are strictly to testlo and > configure, and don't affect installed headers or the ABI, it should be > fine to just include it as a patch on the source package. I guess I'd > rather see if any other issues crop up when it gets into Debian and > people start using it with real software, before stamping out a 0.29 > just for this change. > > On the other hand the "fix" for SO_REUSEPORT might be significant > enough to warrant it, if it truly is an issue on Linux. It might be > worth adding an explicit configure flag to disable it if we can't come > up with a cleaner way to detect whether it is needed, or just enable > it explicitly for Darwin (via uname). > > There seems to be little information out there available regarding > guidelines for SO_REUSEPORT in the context of multicast. > > >> I wish I knew how to reproduce the > >> network-restricted setup of the buildd machines though. > > > > Yes, unfortunately it seems to be difficult to reproduce. > > Yeah.. still I'm pretty curious what aspect of the software changed, > since 0.26 was passing without issues. There have been a lot of > changes, including to TCP and multicast support, so it's hard to > guess. The only sane approach would be to do a kind of "git bisect" > type of testing but this doesn't seem feasible with buildd. > > Oh well, I think our solution makes sense anyways, given the network > restrictions on buildd. It would be nice to add more tests to testlo > that exercise more code paths without triggering network system calls. > But that might have to wait for a major cleanup/refactoring some time > in the future. > > Steve > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > liblo-devel mailing list > lib...@li... > https://lists.sourceforge.net/lists/listinfo/liblo-devel > |
|
From: Stephen S. <rad...@gm...> - 2014-02-10 22:31:10
|
On Mon, Feb 10, 2014 at 5:56 PM, Felipe Sateler <fsa...@gm...> wrote: > On Sun, Feb 9, 2014 at 12:44 PM, Stephen Sinclair <rad...@gm...> wrote: >> Hello, >> >> I added a couple of patches to ignore errors from SO_REUSEPORT, and to >> refactor testlo, and allow disabling of network-based tests using a >> configure flag, --disable-network-tests >> >> I added a bunch of other flags while I was at it. >> (--disable-examples, --disable-tools, etc) >> >> While working on testlo I did come across a couple of places that >> might have stalled forever if messages got blocked by a firewall, so I >> added timeouts to these places. > > Excellent. I picked the patches and uploaded disabling the network > tests. Now liblo is building everywhere except sparc (intentional) and > ia64 (no longer a release architecture, has old gcc) so far. Great news. Yeah I guess since the changes are strictly to testlo and configure, and don't affect installed headers or the ABI, it should be fine to just include it as a patch on the source package. I guess I'd rather see if any other issues crop up when it gets into Debian and people start using it with real software, before stamping out a 0.29 just for this change. On the other hand the "fix" for SO_REUSEPORT might be significant enough to warrant it, if it truly is an issue on Linux. It might be worth adding an explicit configure flag to disable it if we can't come up with a cleaner way to detect whether it is needed, or just enable it explicitly for Darwin (via uname). There seems to be little information out there available regarding guidelines for SO_REUSEPORT in the context of multicast. >> I wish I knew how to reproduce the >> network-restricted setup of the buildd machines though. > > Yes, unfortunately it seems to be difficult to reproduce. Yeah.. still I'm pretty curious what aspect of the software changed, since 0.26 was passing without issues. There have been a lot of changes, including to TCP and multicast support, so it's hard to guess. The only sane approach would be to do a kind of "git bisect" type of testing but this doesn't seem feasible with buildd. Oh well, I think our solution makes sense anyways, given the network restrictions on buildd. It would be nice to add more tests to testlo that exercise more code paths without triggering network system calls. But that might have to wait for a major cleanup/refactoring some time in the future. Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-10 16:57:18
|
On Sun, Feb 9, 2014 at 12:44 PM, Stephen Sinclair <rad...@gm...> wrote: > Hello, > > I added a couple of patches to ignore errors from SO_REUSEPORT, and to > refactor testlo, and allow disabling of network-based tests using a > configure flag, --disable-network-tests > > I added a bunch of other flags while I was at it. > (--disable-examples, --disable-tools, etc) > > While working on testlo I did come across a couple of places that > might have stalled forever if messages got blocked by a firewall, so I > added timeouts to these places. Excellent. I picked the patches and uploaded disabling the network tests. Now liblo is building everywhere except sparc (intentional) and ia64 (no longer a release architecture, has old gcc) so far. > I wish I knew how to reproduce the > network-restricted setup of the buildd machines though. Yes, unfortunately it seems to be difficult to reproduce. -- Saludos, Felipe Sateler |
|
From: Stephen S. <rad...@gm...> - 2014-02-09 15:44:29
|
Hello, I added a couple of patches to ignore errors from SO_REUSEPORT, and to refactor testlo, and allow disabling of network-based tests using a configure flag, --disable-network-tests I added a bunch of other flags while I was at it. (--disable-examples, --disable-tools, etc) While working on testlo I did come across a couple of places that might have stalled forever if messages got blocked by a firewall, so I added timeouts to these places. I wish I knew how to reproduce the network-restricted setup of the buildd machines though. Steve On Thu, Feb 6, 2014 at 11:08 PM, Felipe Sateler <fsa...@gm...> wrote: > On Thu, Feb 6, 2014 at 6:58 PM, Stephen Sinclair <rad...@gm...> wrote: >> On Thu, Feb 6, 2014 at 9:17 PM, Felipe Sateler <fsa...@gm...> wrote: >>> On Wed, Feb 5, 2014 at 1:02 PM, Felipe Sateler <fsa...@gm...> wrote: >>>> Note that there is also a chance that the host setup might affect >>>> builds (say, firewall rules). >>> >>> OK, I tested running testlo in a private network namespace (that is, >>> no internet access other than localhost) and the test fails gracefuly >>> (ie, it doesn't hang forever). >>> >>> I think it could be that a firewall is dropping packets, is it >>> possible that liblo waits forever for a message to appear? >> >> I'm not aware of anywhere off the top of my head where it should do >> that (I explicitly programmed some timeouts in some places to avoid >> that kind of thing), however it definitely sounds like a possible >> explanation. The complexities of testing networking software! I >> think it's reasonable to give some flags to testlo to make it avoid >> any tests that don't use localhost, or even be able to run only the >> non-sending/receiving code. (Exercise the validation/serialisation >> paths for example, without actually inducing any network activity.) >> >> This will take a bit of work however. Ideally, something I've thought >> for a very long time but procrastinated on heavily, is that testlo >> should be split up into smaller programs or at least more modular >> subroutines. > > Or use a proper unit testing framework ;). I totally understand the > procrastination, though :) > >> >> From a package manager point of view, is the best way to just provide >> some configure options to allow a condition where "make test" is >> "firewall-safe"..? > > Ideally the testsuite would work without networking (ie, only > localhost). Failing that, disabling the firewall-unsafe code test is > second-best. Third-best I can disable the testsuite entirely, but I > think runnning the testsuite in the debian build farm could help catch > bugs. > > > BTW, the failure seems to be in test_validation, but I find it > unlikely that the multicast tests will pass either. > > -- > > Saludos, > Felipe Sateler > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > _______________________________________________ > liblo-devel mailing list > lib...@li... > https://lists.sourceforge.net/lists/listinfo/liblo-devel |
|
From: Felipe S. <fsa...@gm...> - 2014-02-06 22:10:33
|
On Thu, Feb 6, 2014 at 6:58 PM, Stephen Sinclair <rad...@gm...> wrote: > On Thu, Feb 6, 2014 at 9:17 PM, Felipe Sateler <fsa...@gm...> wrote: >> On Wed, Feb 5, 2014 at 1:02 PM, Felipe Sateler <fsa...@gm...> wrote: >>> Note that there is also a chance that the host setup might affect >>> builds (say, firewall rules). >> >> OK, I tested running testlo in a private network namespace (that is, >> no internet access other than localhost) and the test fails gracefuly >> (ie, it doesn't hang forever). >> >> I think it could be that a firewall is dropping packets, is it >> possible that liblo waits forever for a message to appear? > > I'm not aware of anywhere off the top of my head where it should do > that (I explicitly programmed some timeouts in some places to avoid > that kind of thing), however it definitely sounds like a possible > explanation. The complexities of testing networking software! I > think it's reasonable to give some flags to testlo to make it avoid > any tests that don't use localhost, or even be able to run only the > non-sending/receiving code. (Exercise the validation/serialisation > paths for example, without actually inducing any network activity.) > > This will take a bit of work however. Ideally, something I've thought > for a very long time but procrastinated on heavily, is that testlo > should be split up into smaller programs or at least more modular > subroutines. Or use a proper unit testing framework ;). I totally understand the procrastination, though :) > > From a package manager point of view, is the best way to just provide > some configure options to allow a condition where "make test" is > "firewall-safe"..? Ideally the testsuite would work without networking (ie, only localhost). Failing that, disabling the firewall-unsafe code test is second-best. Third-best I can disable the testsuite entirely, but I think runnning the testsuite in the debian build farm could help catch bugs. BTW, the failure seems to be in test_validation, but I find it unlikely that the multicast tests will pass either. -- Saludos, Felipe Sateler |
|
From: Stephen S. <rad...@gm...> - 2014-02-06 21:58:51
|
On Thu, Feb 6, 2014 at 9:17 PM, Felipe Sateler <fsa...@gm...> wrote: > On Wed, Feb 5, 2014 at 1:02 PM, Felipe Sateler <fsa...@gm...> wrote: >> Note that there is also a chance that the host setup might affect >> builds (say, firewall rules). > > OK, I tested running testlo in a private network namespace (that is, > no internet access other than localhost) and the test fails gracefuly > (ie, it doesn't hang forever). > > I think it could be that a firewall is dropping packets, is it > possible that liblo waits forever for a message to appear? I'm not aware of anywhere off the top of my head where it should do that (I explicitly programmed some timeouts in some places to avoid that kind of thing), however it definitely sounds like a possible explanation. The complexities of testing networking software! I think it's reasonable to give some flags to testlo to make it avoid any tests that don't use localhost, or even be able to run only the non-sending/receiving code. (Exercise the validation/serialisation paths for example, without actually inducing any network activity.) This will take a bit of work however. Ideally, something I've thought for a very long time but procrastinated on heavily, is that testlo should be split up into smaller programs or at least more modular subroutines. >From a package manager point of view, is the best way to just provide some configure options to allow a condition where "make test" is "firewall-safe"..? Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-06 20:19:01
|
On Wed, Feb 5, 2014 at 1:02 PM, Felipe Sateler <fsa...@gm...> wrote: > Note that there is also a chance that the host setup might affect > builds (say, firewall rules). OK, I tested running testlo in a private network namespace (that is, no internet access other than localhost) and the test fails gracefuly (ie, it doesn't hang forever). I think it could be that a firewall is dropping packets, is it possible that liblo waits forever for a message to appear? -- Saludos, Felipe Sateler |
|
From: Felipe S. <fsa...@gm...> - 2014-02-05 16:03:12
|
On Wed, Feb 5, 2014 at 11:06 AM, Stephen Sinclair <rad...@gm...> wrote: > On Wed, Feb 5, 2014 at 2:22 PM, Felipe Sateler <fsa...@gm...> wrote: >> On Wed, Feb 5, 2014 at 10:05 AM, Stephen Sinclair <rad...@gm...> wrote: >>> On Wed, Feb 5, 2014 at 12:28 PM, Felipe Sateler <fsa...@gm...> wrote: >> >>>> At the end of the log we see the reason: "Build killed with signal >>>> TERM after 150 minutes of inactivity". Perhaps testlo is waiting for a >>>> message that never gets sent? I have to note that the build daemons >>>> have very restricted internet access so it may well not be possible to >>>> access the computer itself over the external IP address. This failure >>>> is different, as the build daemon killed the build after a timeout, >>>> instead of the build actually failing. >>> >>> Off the top of my head I can't think why the test would stall for 150 >>> minutes, but perhaps there's an issue. I thought I'd programmed >>> testlo with a limited number of tries while waiting for responses from >>> subtest. I'll have to take a look. Hard to debug when I can't >>> reproduce it though. >>> >>> (In particular I'm confused by the i386 failure, since I test on my >>> 32-bit laptop all the time. And why is there no x86_64 test?) >> >> The logs are recorded by the build daemons. Because my own machine in >> which I build and upload the packages is x86_64, no log gets recorded >> by the build daemons (there is no need to perform a x86_64 build). >> >> The problem is likely to be due to the restricted environment in which >> the builds are performed, because on my own machine the tests pass >> just fine. > > Yes, I agree. I wish I could reproduce the setup, I suppose there is > some information on how Debian build machines are configured. It > never occurred to me to try building liblo in a chroot with the > headers from another kernel, for example ;-) There is some information at the following links, but I once tried to replicate the buildd setup and failed... because the build daemon only builds, it does not schedule. The buildd setup, plus inspecting buildd-watcher script to figure out how to schedule builds may get you up and running. Note that there is also a chance that the host setup might affect builds (say, firewall rules). https://wiki.debian.org/buildd https://buildd.debian.org/docs/buildd-setup.txt The standard setup is debian stable host, with the chroot of the target distribution (usually unstable). > >>>> Is it possible to tell testlo to only use localhost? That interface is >>>> known to be usable for builds. >>> >>> I'm not sure what the right approach is. testlo does in principle >>> depend on a sort of normal networking environment, where it tries to >>> detect the machine's host address and sends messages back to itself. >>> Perhaps such tests could be disabled somehow, though currently testlo >>> and "make test" takes no arguments. I suppose such a condition could >>> be specified as an option to configure.. >> >> If testlo cannot work using only localhost, then perhaps I should just >> disable the tests. But forcing testlo to use localhost perhaps is >> sufficient to make the tests pass in such a restricted environment. > > Well, we can make it work that way with an option, if necessary. I > have no problem doing a short-term 0.29 release just for Debian if it > comes to that. Lets try patching the debian package first. I have no problem adding patches that are part of the next release, and no need to inconvenience everyone else. > >>>> [1] https://buildd.debian.org/status/logs.php?pkg=liblo >>> >>> Thanks for this. It would be great to be able to do such tests on >>> this Debian system before a release next time. >> >> Yeah, sorry. I never uploaded rc1 to experimental, because the >> testsuite hung even on a normal machine (I believe you fixed this >> shortly after the rc). >> >> I do intend to upload any rc to debian experimental, but I failed this >> time. Sorry. > > No it's fine, I wasn't placing any blame, I just didn't fully > understand the resources available to Debian maintainers. I wish I > had this build farm available at my fingertips for this kind of > testing ;) :). > > I thought about doing an rc2 but since there were no more reports and > it built and tested fine for me on 2 architectures and 3 OSs I > figured it was okay to just go ahead. > > Still, I'm really curious why and where it is stalling during the > test. Personally I wouldn't upload it until we figure it out. It appears you have some confusion about the buildd network. I cannot[1] use the buildd network directly. I have to upload to debian, where the system will pickup the new uploads and schedule the builds. So I have no choice but to upload and hope the latest change fixed it :(. On prereleases, I could upload to experimental to minimize problems (ie, unstable users will not auto-upgrade to the prerelease version), but there is still a public upload involved (ie, people can still install it if they so desire). [1] Now, there are plans to create personal archives for developers precisely for this kind of pre-test, but not implemented yet. > Admittedly I've been meaning to fix up testlo to output more useful > and consistent logging information for some time. > > Any ideas on how I could reproduce the Debian build environment? For > example are there any virtual machine images available for this > purpose? Besides the documents listed above, I don't think there is much else about the build environment. AFAIK, there are no virtual machine images. -- Saludos, Felipe Sateler |
|
From: Stephen S. <rad...@gm...> - 2014-02-05 14:06:51
|
On Wed, Feb 5, 2014 at 2:22 PM, Felipe Sateler <fsa...@gm...> wrote: > On Wed, Feb 5, 2014 at 10:05 AM, Stephen Sinclair <rad...@gm...> wrote: >> On Wed, Feb 5, 2014 at 12:28 PM, Felipe Sateler <fsa...@gm...> wrote: > >>> At the end of the log we see the reason: "Build killed with signal >>> TERM after 150 minutes of inactivity". Perhaps testlo is waiting for a >>> message that never gets sent? I have to note that the build daemons >>> have very restricted internet access so it may well not be possible to >>> access the computer itself over the external IP address. This failure >>> is different, as the build daemon killed the build after a timeout, >>> instead of the build actually failing. >> >> Off the top of my head I can't think why the test would stall for 150 >> minutes, but perhaps there's an issue. I thought I'd programmed >> testlo with a limited number of tries while waiting for responses from >> subtest. I'll have to take a look. Hard to debug when I can't >> reproduce it though. >> >> (In particular I'm confused by the i386 failure, since I test on my >> 32-bit laptop all the time. And why is there no x86_64 test?) > > The logs are recorded by the build daemons. Because my own machine in > which I build and upload the packages is x86_64, no log gets recorded > by the build daemons (there is no need to perform a x86_64 build). > > The problem is likely to be due to the restricted environment in which > the builds are performed, because on my own machine the tests pass > just fine. Yes, I agree. I wish I could reproduce the setup, I suppose there is some information on how Debian build machines are configured. It never occurred to me to try building liblo in a chroot with the headers from another kernel, for example ;-) >>> Is it possible to tell testlo to only use localhost? That interface is >>> known to be usable for builds. >> >> I'm not sure what the right approach is. testlo does in principle >> depend on a sort of normal networking environment, where it tries to >> detect the machine's host address and sends messages back to itself. >> Perhaps such tests could be disabled somehow, though currently testlo >> and "make test" takes no arguments. I suppose such a condition could >> be specified as an option to configure.. > > If testlo cannot work using only localhost, then perhaps I should just > disable the tests. But forcing testlo to use localhost perhaps is > sufficient to make the tests pass in such a restricted environment. Well, we can make it work that way with an option, if necessary. I have no problem doing a short-term 0.29 release just for Debian if it comes to that. >>> [1] https://buildd.debian.org/status/logs.php?pkg=liblo >> >> Thanks for this. It would be great to be able to do such tests on >> this Debian system before a release next time. > > Yeah, sorry. I never uploaded rc1 to experimental, because the > testsuite hung even on a normal machine (I believe you fixed this > shortly after the rc). > > I do intend to upload any rc to debian experimental, but I failed this > time. Sorry. No it's fine, I wasn't placing any blame, I just didn't fully understand the resources available to Debian maintainers. I wish I had this build farm available at my fingertips for this kind of testing ;) I thought about doing an rc2 but since there were no more reports and it built and tested fine for me on 2 architectures and 3 OSs I figured it was okay to just go ahead. Still, I'm really curious why and where it is stalling during the test. Personally I wouldn't upload it until we figure it out. Admittedly I've been meaning to fix up testlo to output more useful and consistent logging information for some time. Any ideas on how I could reproduce the Debian build environment? For example are there any virtual machine images available for this purpose? Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-05 13:23:25
|
On Wed, Feb 5, 2014 at 10:05 AM, Stephen Sinclair <rad...@gm...> wrote: > On Wed, Feb 5, 2014 at 12:28 PM, Felipe Sateler <fsa...@gm...> wrote: >> At the end of the log we see the reason: "Build killed with signal >> TERM after 150 minutes of inactivity". Perhaps testlo is waiting for a >> message that never gets sent? I have to note that the build daemons >> have very restricted internet access so it may well not be possible to >> access the computer itself over the external IP address. This failure >> is different, as the build daemon killed the build after a timeout, >> instead of the build actually failing. > > Off the top of my head I can't think why the test would stall for 150 > minutes, but perhaps there's an issue. I thought I'd programmed > testlo with a limited number of tries while waiting for responses from > subtest. I'll have to take a look. Hard to debug when I can't > reproduce it though. > > (In particular I'm confused by the i386 failure, since I test on my > 32-bit laptop all the time. And why is there no x86_64 test?) The logs are recorded by the build daemons. Because my own machine in which I build and upload the packages is x86_64, no log gets recorded by the build daemons (there is no need to perform a x86_64 build). The problem is likely to be due to the restricted environment in which the builds are performed, because on my own machine the tests pass just fine. >> Is it possible to tell testlo to only use localhost? That interface is >> known to be usable for builds. > > I'm not sure what the right approach is. testlo does in principle > depend on a sort of normal networking environment, where it tries to > detect the machine's host address and sends messages back to itself. > Perhaps such tests could be disabled somehow, though currently testlo > and "make test" takes no arguments. I suppose such a condition could > be specified as an option to configure.. If testlo cannot work using only localhost, then perhaps I should just disable the tests. But forcing testlo to use localhost perhaps is sufficient to make the tests pass in such a restricted environment. > >> [1] https://buildd.debian.org/status/logs.php?pkg=liblo > > Thanks for this. It would be great to be able to do such tests on > this Debian system before a release next time. Yeah, sorry. I never uploaded rc1 to experimental, because the testsuite hung even on a normal machine (I believe you fixed this shortly after the rc). I do intend to upload any rc to debian experimental, but I failed this time. Sorry. -- Saludos, Felipe Sateler |
|
From: Stephen S. <rad...@gm...> - 2014-02-05 13:06:02
|
On Wed, Feb 5, 2014 at 12:28 PM, Felipe Sateler <fsa...@gm...> wrote: > On Wed, Feb 5, 2014 at 8:19 AM, Stephen Sinclair <rad...@gm...> wrote: >> Wait a second.. I've just reviewed the source and the build log in a >> bit more detail. > > For which version? 0.28-1 or 0.28-2? I hadn't noticed there were multiple versions, but I guess I was looking at what you linked to before, which seems to be 0.28-2. >> In server.c, indeed the result of setting SO_REUSEPORT is actually >> ignored -- and is only tried in a multicast context in any case. > > I'm not actually sure about that... that would depend on what does lo_throw do. You're right, my mistake. In testlo lo_throw() causes a failure exit. [... skip ..] > At the end of the log we see the reason: "Build killed with signal > TERM after 150 minutes of inactivity". Perhaps testlo is waiting for a > message that never gets sent? I have to note that the build daemons > have very restricted internet access so it may well not be possible to > access the computer itself over the external IP address. This failure > is different, as the build daemon killed the build after a timeout, > instead of the build actually failing. Off the top of my head I can't think why the test would stall for 150 minutes, but perhaps there's an issue. I thought I'd programmed testlo with a limited number of tries while waiting for responses from subtest. I'll have to take a look. Hard to debug when I can't reproduce it though. (In particular I'm confused by the i386 failure, since I test on my 32-bit laptop all the time. And why is there no x86_64 test?) > Is it possible to tell testlo to only use localhost? That interface is > known to be usable for builds. I'm not sure what the right approach is. testlo does in principle depend on a sort of normal networking environment, where it tries to detect the machine's host address and sends messages back to itself. Perhaps such tests could be disabled somehow, though currently testlo and "make test" takes no arguments. I suppose such a condition could be specified as an option to configure.. > [1] https://buildd.debian.org/status/logs.php?pkg=liblo Thanks for this. It would be great to be able to do such tests on this Debian system before a release next time. Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-05 12:28:48
|
On Wed, Feb 5, 2014 at 8:19 AM, Stephen Sinclair <rad...@gm...> wrote: > Wait a second.. I've just reviewed the source and the build log in a > bit more detail. For which version? 0.28-1 or 0.28-2? > > In server.c, indeed the result of setting SO_REUSEPORT is actually > ignored -- and is only tried in a multicast context in any case. I'm not actually sure about that... that would depend on what does lo_throw do. > > Secondly, the build log doesn't seem to indicate anything to me about > SO_REUSEPORT. The errors are actually quite varied. I think the > SO_REUSEPORT thing may be a red herring, or at least I don't see any > evidence for it. > > Let's see: > > armel: seems successful. > > armhf: everything seems to go as expected without errors, but then the > process exits with Error 2. Any way of getting more information? > > ia64: the error is a C++ compile error, which version of g++ is in > use? I think the C++11 features may fail on g++4.6, where C++11 was > only partially implemented yet the lambda test passes in configure. > That's the only case I > know of that could cause this problem. ia64 is going away (inability to update gcc among other issues) so we can ignore it. > > kfreebsd: yes, testlo failed for me when i tested on freebsd in qemu > > i386, s390x, powerpc >> liblo server error 9912 in /: Invalid message received (expected) >> E: Caught signal ‘Terminated’: terminating immediately > > What's that all about? Does the debian build server send a terminate > signal to the test process for some reason? I think you are watching the build logs for 0.28-2, where I patched out the SO_REUSEPORT section. I actually had a draft reply noting that the builds still fail, although this seems to be a different reason. At [1] you can find all build logs. The build logs for 0.28-1 (on i386 to eliminate strange arch issues) seem to be regular failures: === liblo server error 9912 in /: Invalid message received (expected) test run not completed liblo test FAILED Reply received from osc.udp://206.12.19.115:15954/ Reply received from osc.udp://206.12.19.115:15954/ make[2]: *** [test] Error 1 liblo server error 92 in setsockopt(SO_REUSEPORT): Protocol not availablemake[2]: Leaving directory `/«PKGBUILDDIR»/src' make[1]: *** [test] Error 2 make[1]: Leaving directory `/«PKGBUILDDIR»' make: *** [debian/stamp-makefile-check] Error 2 dpkg-buildpackage: error: debian/rules build-arch gave error exit status 2 === However for version 0.28-2 I removed the SO_REUSEPORT code and this is what we get (again i386): === Reply sent to osc.udp://206.12.19.115:18934/ liblo server error 9912 in /: Invalid message received (expected) make[1]: *** wait: No child processes. Stop. make[1]: *** Waiting for unfinished jobs.... make[1]: *** wait: No child processes. Stop. make: *** [debian/stamp-makefile-check] Error 2 make[2]: *** [test] Terminated Reply received from osBuild killed with signal TERM after 150 minutes of inactivity === At the end of the log we see the reason: "Build killed with signal TERM after 150 minutes of inactivity". Perhaps testlo is waiting for a message that never gets sent? I have to note that the build daemons have very restricted internet access so it may well not be possible to access the computer itself over the external IP address. This failure is different, as the build daemon killed the build after a timeout, instead of the build actually failing. Is it possible to tell testlo to only use localhost? That interface is known to be usable for builds. [1] https://buildd.debian.org/status/logs.php?pkg=liblo -- Saludos, Felipe Sateler |
|
From: Stephen S. <rad...@gm...> - 2014-02-05 11:19:53
|
Wait a second.. I've just reviewed the source and the build log in a bit more detail. In server.c, indeed the result of setting SO_REUSEPORT is actually ignored -- and is only tried in a multicast context in any case. Secondly, the build log doesn't seem to indicate anything to me about SO_REUSEPORT. The errors are actually quite varied. I think the SO_REUSEPORT thing may be a red herring, or at least I don't see any evidence for it. Let's see: armel: seems successful. armhf: everything seems to go as expected without errors, but then the process exits with Error 2. Any way of getting more information? ia64: the error is a C++ compile error, which version of g++ is in use? I think the C++11 features may fail on g++4.6, where C++11 was only partially implemented yet the lambda test passes in configure. That's the only case I know of that could cause this problem. kfreebsd: yes, testlo failed for me when i tested on freebsd in qemu i386, s390x, powerpc > liblo server error 9912 in /: Invalid message received (expected) > E: Caught signal ‘Terminated’: terminating immediately What's that all about? Does the debian build server send a terminate signal to the test process for some reason? Steve On Tue, Feb 4, 2014 at 10:45 PM, Stephen Sinclair <rad...@gm...> wrote: > On Tue, Feb 4, 2014 at 1:59 PM, Felipe Sateler <fsa...@gm...> wrote: >> On Tue, Feb 4, 2014 at 4:42 AM, Stephen Sinclair <rad...@gm...> wrote: >>> Hi Felipe, >>> >>> Hm, that is unexpected. Since it is already #ifdef'd out, why is it >>> causing a problem? I'm not sure I understand the mechanism behind the >>> build network. Is it being built on an old kernel, but against >>> headers for a newer kernel? That's the only reason I could think this >>> would cause a problem.. >> >> Indeed, that is the problem, as already mentioned by IOhannes. >> The buildd network consists of machines >> running debian stable (that is, a 3.2 kernel) in which a chroot with >> debian unstable is created. The package is built in the chroot, so >> that means we have new libc and kernel headers, but old runtime >> kernel. As mentioned in the original mail, the problem is during the >> testsuite run, not at build time. > > Sure -- I was proposing running a program at configure time to detect > if it is available, but you're right it may be better to do it every > time the code runs. This implies (as I state below) simply trying to > set it but ignoring any error, but this just feels a bit weird to me. > >>> That said, perhaps it would actually be better to detect the existence >>> of SO_REUSEPORT at configure time instead of compile time, not sure if >>> that would fix the problem or not. >> >> It wouldn't really fix it, because the fundamental problem is that >> kernel features can only be detected at runtime, because there is no >> guarantee that the running kernel is the same as the one with which >> the package was built. It would make it easier to disable it, though. >> >> Maybe on lo_server_new a test could be made to check if SO_REUSEPORT >> is supported by the kernel (AFAICT SO_REUSEPORT is only used on the >> server?). > > That's not a bad idea... > >> I'm a bit confused as to how lo_throw works, but that seems to be the >> proximate cause of the problem. Alternative solution: perhaps liblo >> can just ignore the setsockopts(SO_REUSEPORT) errors? > > I think ignoring the error is the correct behaviour in conditions > where it is not needed anyway, but then perhaps it shouldn't be > attempted in the first place. The goal is to not have unpredictable > behaviour, unicast and multicast should "just work" under all > supported operating systems. I have to go back through old emails, > but if I remember it turned out that SO_REUSEPORT was needed on OS X, > and available but not actually needed on Linux. > >>> Yes, technically you could just take out those sections of the code, >>> but I was hoping you wouldn't need to repack the archive. I don't >>> understand why SO_REUSEPORT is defined if it is not supported. >> >> Ok, I'm commenting out the code section to prevent build failures for >> now. If SO_REUSEPORT can be detected at runtime or the errors can be >> safely ignored, I can replace the patch with one that does the right >> thing. > > I think patching the code for Linux is an ok solution for now, but I'd > like to figure out the right thing to do. I would have been happy > taking the flag out entirely as IOhannes has suggested in the past, > but I ran into some situation which needed it, and made the assumption > that if it is defined, it should be required. > > > Steve |
|
From: Stephen S. <rad...@gm...> - 2014-02-04 21:45:31
|
On Tue, Feb 4, 2014 at 1:59 PM, Felipe Sateler <fsa...@gm...> wrote: > On Tue, Feb 4, 2014 at 4:42 AM, Stephen Sinclair <rad...@gm...> wrote: >> Hi Felipe, >> >> Hm, that is unexpected. Since it is already #ifdef'd out, why is it >> causing a problem? I'm not sure I understand the mechanism behind the >> build network. Is it being built on an old kernel, but against >> headers for a newer kernel? That's the only reason I could think this >> would cause a problem.. > > Indeed, that is the problem, as already mentioned by IOhannes. > The buildd network consists of machines > running debian stable (that is, a 3.2 kernel) in which a chroot with > debian unstable is created. The package is built in the chroot, so > that means we have new libc and kernel headers, but old runtime > kernel. As mentioned in the original mail, the problem is during the > testsuite run, not at build time. Sure -- I was proposing running a program at configure time to detect if it is available, but you're right it may be better to do it every time the code runs. This implies (as I state below) simply trying to set it but ignoring any error, but this just feels a bit weird to me. >> That said, perhaps it would actually be better to detect the existence >> of SO_REUSEPORT at configure time instead of compile time, not sure if >> that would fix the problem or not. > > It wouldn't really fix it, because the fundamental problem is that > kernel features can only be detected at runtime, because there is no > guarantee that the running kernel is the same as the one with which > the package was built. It would make it easier to disable it, though. > > Maybe on lo_server_new a test could be made to check if SO_REUSEPORT > is supported by the kernel (AFAICT SO_REUSEPORT is only used on the > server?). That's not a bad idea... > I'm a bit confused as to how lo_throw works, but that seems to be the > proximate cause of the problem. Alternative solution: perhaps liblo > can just ignore the setsockopts(SO_REUSEPORT) errors? I think ignoring the error is the correct behaviour in conditions where it is not needed anyway, but then perhaps it shouldn't be attempted in the first place. The goal is to not have unpredictable behaviour, unicast and multicast should "just work" under all supported operating systems. I have to go back through old emails, but if I remember it turned out that SO_REUSEPORT was needed on OS X, and available but not actually needed on Linux. >> Yes, technically you could just take out those sections of the code, >> but I was hoping you wouldn't need to repack the archive. I don't >> understand why SO_REUSEPORT is defined if it is not supported. > > Ok, I'm commenting out the code section to prevent build failures for > now. If SO_REUSEPORT can be detected at runtime or the errors can be > safely ignored, I can replace the patch with one that does the right > thing. I think patching the code for Linux is an ok solution for now, but I'd like to figure out the right thing to do. I would have been happy taking the flag out entirely as IOhannes has suggested in the past, but I ran into some situation which needed it, and made the assumption that if it is defined, it should be required. Steve |
|
From: Felipe S. <fsa...@gm...> - 2014-02-04 13:00:15
|
On Tue, Feb 4, 2014 at 4:42 AM, Stephen Sinclair <rad...@gm...> wrote: > Hi Felipe, > > Hm, that is unexpected. Since it is already #ifdef'd out, why is it > causing a problem? I'm not sure I understand the mechanism behind the > build network. Is it being built on an old kernel, but against > headers for a newer kernel? That's the only reason I could think this > would cause a problem.. Indeed, that is the problem, as already mentioned by IOhannes. The buildd network consists of machines running debian stable (that is, a 3.2 kernel) in which a chroot with debian unstable is created. The package is built in the chroot, so that means we have new libc and kernel headers, but old runtime kernel. As mentioned in the original mail, the problem is during the testsuite run, not at build time. > > That said, perhaps it would actually be better to detect the existence > of SO_REUSEPORT at configure time instead of compile time, not sure if > that would fix the problem or not. It wouldn't really fix it, because the fundamental problem is that kernel features can only be detected at runtime, because there is no guarantee that the running kernel is the same as the one with which the package was built. It would make it easier to disable it, though. Maybe on lo_server_new a test could be made to check if SO_REUSEPORT is supported by the kernel (AFAICT SO_REUSEPORT is only used on the server?). I'm a bit confused as to how lo_throw works, but that seems to be the proximate cause of the problem. Alternative solution: perhaps liblo can just ignore the setsockopts(SO_REUSEPORT) errors? > Yes, technically you could just take out those sections of the code, > but I was hoping you wouldn't need to repack the archive. I don't > understand why SO_REUSEPORT is defined if it is not supported. Ok, I'm commenting out the code section to prevent build failures for now. If SO_REUSEPORT can be detected at runtime or the errors can be safely ignored, I can replace the patch with one that does the right thing. -- Saludos, Felipe Sateler |
|
From: IOhannes m z. <zmo...@ie...> - 2014-02-04 10:36:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2014-02-04 08:42, Stephen Sinclair wrote: > Yes, technically you could just take out those sections of the > code, but I was hoping you wouldn't need to repack the archive. I > don't understand why SO_REUSEPORT is defined if it is not > supported. mainly because SO_REUSEPORT is defined by some header that is installed on the harddisk (socket.h), but whether it is actually supported by your system can only by found out at runtime (which kernel is actually running). fgmasdr IOhanne -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iQIcBAEBCAAGBQJS8LmFAAoJELZQGcR/ejb4F4oQAIZbfYsj+kpRbLvCHKnHELRG H/KwCpC+/DonEdk8JXLMDDDtQhOPA7O/0EEYSuct26AIsBMabBR3DfDtE4xtkvnD M+aBO5uFLqpyTAmccSVOTZqMJ325thc64gjtEBNqTCfc5wEjxTp5s5omGDHvH+zt B60oj8kLi6Qt1IG/HdRb/AkhlQWPdpb9haTXk6ePbk5Etyn0y43PdmYySrUeJS0K rJM2lpWlXxnN2SnFr/AZ52m4n8Jxw7bdweDHLPzsyAc106T2v05dZWn7V4lXgEZ6 4WDjM2npQs3ekftikU4L8Yh64Nit40TlQja0/0ozX4zEz3qbDvF3tYhfiXUecIxF +o2Gkv8A+FnYElnwno8mqVaGLT8S0Mhrg+rgCsaFeGIpYRaFv0FwtjVEqTU1brlX DF3Nb+gqTCIr+DscfIcAmeCd0kyxKu4FLq50Youd7EtNCKMBb7EZypbqdvoAsRHg nEJ+CXY9ULaBZ5oK93qHpz1xkdmE1ODw4OUPz82+xsYEIUZzW+G5AUhpLbnCbfRl blyXLoZT8uJTljMIwNx5Jy0+Pxu4ykRn6HxibEo4pBs1kwsmFGUdoznV1NdiWPIk SvBQV0Hgv9PJvj70IjKWIQ8+gFajbSwxSfrpyB0Vssvd7eFJNLv1U3vvlDBHr+pl SPHeC8k8zHFnJgmzQdqT =kATB -----END PGP SIGNATURE----- |
|
From: Stephen S. <rad...@gm...> - 2014-02-04 07:42:29
|
Hi Felipe, Hm, that is unexpected. Since it is already #ifdef'd out, why is it causing a problem? I'm not sure I understand the mechanism behind the build network. Is it being built on an old kernel, but against headers for a newer kernel? That's the only reason I could think this would cause a problem.. That said, perhaps it would actually be better to detect the existence of SO_REUSEPORT at configure time instead of compile time, not sure if that would fix the problem or not. Yes, technically you could just take out those sections of the code, but I was hoping you wouldn't need to repack the archive. I don't understand why SO_REUSEPORT is defined if it is not supported. Steve On Mon, Feb 3, 2014 at 6:00 PM, Felipe Sateler <fsa...@gm...> wrote: > On Mon, Feb 3, 2014 at 11:14 AM, Felipe Sateler <fsa...@gm...> wrote: >> On Mon, Jan 27, 2014 at 3:00 PM, Stephen Sinclair <rad...@gm...> wrote: >>> We are pleased to present stable release 0.28 of LibLo, the >>> lightweight, easy to use implementation of the Open Sound Control >>> protocol. >> >> Hi Stephen, >> >> I uploaded liblo to debian, but I'm encountering failures in the test >> suite (first time enabled in this version)[1]. Apparently SO_REUSEPORT >> is not available in the buildd network (linux is 3.2, and REUSEPORT >> needs 3.9). >> >> I think the REUSEPORT check is done at build time. Could it be done at >> runtime? Or is there another workaround? > > It seems to me that I could just patch out the part that is already > #ifdef protected. Do I risk breaking anything by doing so? > > > > -- > > Saludos, > Felipe Sateler > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > _______________________________________________ > liblo-devel mailing list > lib...@li... > https://lists.sourceforge.net/lists/listinfo/liblo-devel |
|
From: Felipe S. <fsa...@gm...> - 2014-02-03 17:01:25
|
On Mon, Feb 3, 2014 at 11:14 AM, Felipe Sateler <fsa...@gm...> wrote: > On Mon, Jan 27, 2014 at 3:00 PM, Stephen Sinclair <rad...@gm...> wrote: >> We are pleased to present stable release 0.28 of LibLo, the >> lightweight, easy to use implementation of the Open Sound Control >> protocol. > > Hi Stephen, > > I uploaded liblo to debian, but I'm encountering failures in the test > suite (first time enabled in this version)[1]. Apparently SO_REUSEPORT > is not available in the buildd network (linux is 3.2, and REUSEPORT > needs 3.9). > > I think the REUSEPORT check is done at build time. Could it be done at > runtime? Or is there another workaround? It seems to me that I could just patch out the part that is already #ifdef protected. Do I risk breaking anything by doing so? -- Saludos, Felipe Sateler |