Discussion:
ntpd wedged by libc?
AGC
2012-02-07 17:40:46 UTC
Permalink
Seems I'm still having issues with libc on 5.1/sparc specifically with
ntpd wedging when doing math:

#0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7 0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
at ntp_control.c:1431
#8 0x00025ed0 in ctl_putpeer (id=14, p=0xb1bc0) at ntp_control.c:2459
#9 0x0002adbc in read_variables (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:2981
#10 0x00028798 in process_control (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:1121
#11 0x00038150 in receive (rbufp=0x10522000) at ntp_proto.c:417
#12 0x0002323c in ntpdmain (argc=0, argv=0xefffe8a8) at ntpd.c:1069
#13 0x000138bc in ___start ()
#14 0x000137f4 in _start ()


Ideas anyone?
David Laight
2012-02-07 21:43:37 UTC
Permalink
Post by AGC
Seems I'm still having issues with libc on 5.1/sparc specifically with
#0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7 0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
at ntp_control.c:1431
#8 0x00025ed0 in ctl_putpeer (id=14, p=0xb1bc0) at ntp_control.c:2459
#9 0x0002adbc in read_variables (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:2981
#10 0x00028798 in process_control (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:1121
#11 0x00038150 in receive (rbufp=0x10522000) at ntp_proto.c:417
#12 0x0002323c in ntpdmain (argc=0, argv=0xefffe8a8) at ntpd.c:1069
#13 0x000138bc in ___start ()
#14 0x000137f4 in _start ()
Ideas anyone?
Wasn't there some silly fubar bug in dtoa() ??

David
--
David Laight: ***@l8s.co.uk
AGC
2012-02-07 23:13:19 UTC
Permalink
Post by David Laight
Post by AGC
Seems I'm still having issues with libc on 5.1/sparc specifically with
#0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7 0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
at ntp_control.c:1431
#8 0x00025ed0 in ctl_putpeer (id=14, p=0xb1bc0) at ntp_control.c:2459
#9 0x0002adbc in read_variables (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:2981
#10 0x00028798 in process_control (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:1121
#11 0x00038150 in receive (rbufp=0x10522000) at ntp_proto.c:417
#12 0x0002323c in ntpdmain (argc=0, argv=0xefffe8a8) at ntpd.c:1069
#13 0x000138bc in ___start ()
#14 0x000137f4 in _start ()
Ideas anyone?
Wasn't there some silly fubar bug in dtoa() ??
David
Yes, it's there somehow but it's apparently in libc. I had compiled a
workaround inside ntpd which appears to be working, I'm doing more tests.
David Brownlee
2012-02-08 11:56:25 UTC
Permalink
Post by David Laight
Post by AGC
Seems I'm still having issues with libc on 5.1/sparc specifically with
#0  0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1  0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2  0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3  0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4  0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5  0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6  0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7  0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
    at ntp_control.c:1431
#8  0x00025ed0 in ctl_putpeer (id=14, p=0xb1bc0) at ntp_control.c:2459
#9  0x0002adbc in read_variables (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:2981
#10 0x00028798 in process_control (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:1121
#11 0x00038150 in receive (rbufp=0x10522000) at ntp_proto.c:417
#12 0x0002323c in ntpdmain (argc=0, argv=0xefffe8a8) at ntpd.c:1069
#13 0x000138bc in ___start ()
#14 0x000137f4 in _start ()
Ideas anyone?
Wasn't there some silly fubar bug in dtoa() ??
       David
Yes, it's there somehow but it's apparently in libc.  I had compiled a
workaround inside ntpd which appears to be working, I'm doing more tests.
Apologies for asking a potentially stupid question:

Assuming this is just a specific toxic value or values, rather than
some strange state caused by a sequence of events, is it not possible
to run with a version of libc which logs every value passed to dtoa(),
or just run a (quite long running :) test program which tests every
bit pattern?
AGC
2012-02-08 20:24:00 UTC
Permalink
Post by David Brownlee
Post by AGC
Post by David Laight
Post by AGC
Seems I'm still having issues with libc on 5.1/sparc specifically with
#0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7 0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
at ntp_control.c:1431
#8 0x00025ed0 in ctl_putpeer (id=14, p=0xb1bc0) at ntp_control.c:2459
#9 0x0002adbc in read_variables (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:2981
#10 0x00028798 in process_control (rbufp=0x10522000, restrict_mask=0) at
ntp_control.c:1121
#11 0x00038150 in receive (rbufp=0x10522000) at ntp_proto.c:417
#12 0x0002323c in ntpdmain (argc=0, argv=0xefffe8a8) at ntpd.c:1069
#13 0x000138bc in ___start ()
#14 0x000137f4 in _start ()
Ideas anyone?
Wasn't there some silly fubar bug in dtoa() ??
David
Yes, it's there somehow but it's apparently in libc. I had compiled a
workaround inside ntpd which appears to be working, I'm doing more tests.
Assuming this is just a specific toxic value or values, rather than
some strange state caused by a sequence of events, is it not possible
to run with a version of libc which logs every value passed to dtoa(),
or just run a (quite long running :) test program which tests every
bit pattern?
A test program was suggested at one point on this list. I don't know
what became of the tests just yet if anyone else was also trying to
test. I need to go find that code and run it again now that the system
is up and running.
Christos Zoulas
2012-02-09 03:03:06 UTC
Permalink
Post by David Brownlee
Assuming this is just a specific toxic value or values, rather than
some strange state caused by a sequence of events, is it not possible
to run with a version of libc which logs every value passed to dtoa(),
or just run a (quite long running :) test program which tests every
bit pattern?
Nope this is just a memory leak. Convert enough floating point values
and you run out of memory [the missing Bfree(b)]. This should be pulled
up to 5.

christos

Index: misc.c
===================================================================
RCS file: /cvsroot/src/lib/libc/gdtoa/misc.c,v
retrieving revision 1.7
retrieving revision 1.11
diff -u -r1.7 -r1.11
--- misc.c 21 Mar 2011 04:52:09 -0000 1.7
+++ misc.c 21 Nov 2011 09:46:19 -0000 1.11
@@ -76,8 +76,10 @@
else
rv = (Bigint*)MALLOC(len*sizeof(double));
#endif
- if (rv == NULL)
+ if (rv == NULL) {
+ FREE_DTOA_LOCK(0);
return NULL;
+ }
rv->k = k;
rv->maxwds = x;
}
@@ -415,8 +417,10 @@
ACQUIRE_DTOA_LOCK(1);
if (!(p5 = p5s)) {
p5 = p5s = i2b(625);
- if (p5 == NULL)
+ if (p5 == NULL) {
+ FREE_DTOA_LOCK(1);
return NULL;
+ }
p5->next = 0;
}
FREE_DTOA_LOCK(1);
@@ -432,6 +436,7 @@
b1 = mult(b, p5);
if (b1 == NULL)
return NULL;
+ Bfree(b);
b = b1;
}
if (!(k = (unsigned int)k >> 1))
@@ -441,8 +446,10 @@
ACQUIRE_DTOA_LOCK(1);
if (!(p51 = p5->next)) {
p51 = p5->next = mult(p5,p5);
- if (p51 == NULL)
+ if (p51 == NULL) {
+ FREE_DTOA_LOCK(1);
return NULL;
+ }
p51->next = 0;
}
FREE_DTOA_LOCK(1);
christos
AGC
2012-02-09 03:49:15 UTC
Permalink
Post by Christos Zoulas
Post by David Brownlee
Assuming this is just a specific toxic value or values, rather than
some strange state caused by a sequence of events, is it not possible
to run with a version of libc which logs every value passed to dtoa(),
or just run a (quite long running :) test program which tests every
bit pattern?
Nope this is just a memory leak. Convert enough floating point values
and you run out of memory [the missing Bfree(b)]. This should be pulled
up to 5.
christos
That would certainly explain why ntpd triggers it since it's converting
lots of floats on a regular basis.

So how can I fix this problem?
Christos Zoulas
2012-02-09 04:27:11 UTC
Permalink
On Feb 8, 7:49pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/8/2012 19:03, Christos Zoulas wrote:
| > In article<CAGN_6pY9GoO6jOKSGuKJnDF_ZFRom9HaVZhUHBjMEn=***@mail.gmail.com>,
| > David Brownlee<***@netbsd.org> wrote:
| >> On 7 February 2012 23:13, AGC<agcarver+***@acarver.net> wrote:
| >> Apologies for asking a potentially stupid question:
| >>
| >> Assuming this is just a specific toxic value or values, rather than
| >> some strange state caused by a sequence of events, is it not possible
| >> to run with a version of libc which logs every value passed to dtoa(),
| >> or just run a (quite long running :) test program which tests every
| >> bit pattern?
| >
| > Nope this is just a memory leak. Convert enough floating point values
| > and you run out of memory [the missing Bfree(b)]. This should be pulled
| > up to 5.
| >
| > christos
|
| That would certainly explain why ntpd triggers it since it's converting
| lots of floats on a regular basis.
|
| So how can I fix this problem?

Apply the attached patch to your libc and recompile it.

christos
AGC
2012-02-09 05:31:17 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>>
|>> Assuming this is just a specific toxic value or values, rather than
|>> some strange state caused by a sequence of events, is it not possible
|>> to run with a version of libc which logs every value passed to dtoa(),
|>> or just run a (quite long running :) test program which tests every
|>> bit pattern?
|>
|> Nope this is just a memory leak. Convert enough floating point values
|> and you run out of memory [the missing Bfree(b)]. This should be pulled
|> up to 5.
|>
|> christos
|
| That would certainly explain why ntpd triggers it since it's converting
| lots of floats on a regular basis.
|
| So how can I fix this problem?
Apply the attached patch to your libc and recompile it.
christos
That's where I'm a bit lost since I installed the OS from CD so I only
have the precompiled libraries. The only part of the OS that I've
recompiled ever was the kernel to enable PPS_SYNC. If there's a
document online somewhere that you could point to which would describe
rebuilding the NetBSD libraries I would be grateful. The searches I've
done so far are a bit confusing and lead me to believe that I have to
recompile everything and reinstall nearly from scratch.

It would also be a good exercise to learn so I can upgrade the system in
the future (also another process that I've not quite understood).
Christos Zoulas
2012-02-09 13:56:20 UTC
Permalink
On Feb 8, 9:31pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| That's where I'm a bit lost since I installed the OS from CD so I only
| have the precompiled libraries. The only part of the OS that I've
| recompiled ever was the kernel to enable PPS_SYNC. If there's a
| document online somewhere that you could point to which would describe
| rebuilding the NetBSD libraries I would be grateful. The searches I've
| done so far are a bit confusing and lead me to believe that I have to
| recompile everything and reinstall nearly from scratch.
|
| It would also be a good exercise to learn so I can upgrade the system in
| the future (also another process that I've not quite understood).


If you have the sources online, cd /usr/src/lib/libc/, apply the patch,
make && make install. Post to the lists and someone might have a step
by step guide.


christos
AGC
2012-02-11 09:11:13 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| That's where I'm a bit lost since I installed the OS from CD so I only
| have the precompiled libraries. The only part of the OS that I've
| recompiled ever was the kernel to enable PPS_SYNC. If there's a
| document online somewhere that you could point to which would describe
| rebuilding the NetBSD libraries I would be grateful. The searches I've
| done so far are a bit confusing and lead me to believe that I have to
| recompile everything and reinstall nearly from scratch.
|
| It would also be a good exercise to learn so I can upgrade the system in
| the future (also another process that I've not quite understood).
If you have the sources online, cd /usr/src/lib/libc/, apply the patch,
make&& make install. Post to the lists and someone might have a step
by step guide.
Well, unfortunately that doesn't seem to work when using the sources
from NetBSD-current (which already has the patch):

#pwd
/mnt/src/lib/libc
#make
Post by Christos Zoulas
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 1: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 2: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 3: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 4: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 5: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 6: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 7: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 8: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 9: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 10: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 12: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 13: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 14: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 15: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 16: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 17: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 18: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 19: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 20: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 21: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 22: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 23: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 24: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 25: Need an operator
No closing parenthesis in archive specification
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 26: Error in archive specification: ""
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 27: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 28: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 29: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 30: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 31: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 32: Need an operator
No closing parenthesis in archive specification
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 34: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 37: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 57: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 58: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 59: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 75: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 76: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 77: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 78: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 79: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 92: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 93: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 94: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 95: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 96: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 97: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 98: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 109: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 110: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 111: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 112: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 122: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 123: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 124: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 125: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 126: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 127: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 133: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 134: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 135: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 136: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 137: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 138: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 139: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 140: Need an operator
make: "/mnt/src/lib/libc/arch/sparc/Makefile.inc" line 141: Need an operator
make: "/mnt/src/lib/libc/Makefile" line 79: Malformed conditional ((${USE_LIBTRE} == "yes"))
make: "/mnt/src/lib/libc/Makefile" line 79: Missing dependency operator
make: Fatal errors encountered -- cannot continue
make: stopped in /mnt/src/lib/libc
Christos Zoulas
2012-02-11 14:57:57 UTC
Permalink
On Feb 11, 1:11am, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/9/2012 05:56, Christos Zoulas wrote:
| > On Feb 8, 9:31pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | That's where I'm a bit lost since I installed the OS from CD so I only
| > | have the precompiled libraries. The only part of the OS that I've
| > | recompiled ever was the kernel to enable PPS_SYNC. If there's a
| > | document online somewhere that you could point to which would describe
| > | rebuilding the NetBSD libraries I would be grateful. The searches I've
| > | done so far are a bit confusing and lead me to believe that I have to
| > | recompile everything and reinstall nearly from scratch.
| > |
| > | It would also be a good exercise to learn so I can upgrade the system in
| > | the future (also another process that I've not quite understood).
| >
| >
| > If you have the sources online, cd /usr/src/lib/libc/, apply the patch,
| > make&& make install. Post to the lists and someone might have a step
| > by step guide.
|
| Well, unfortunately that doesn't seem to work when using the sources
| from NetBSD-current (which already has the patch):

Yes, because these need a more recent set of system makefiles and perhaps a
more recent version of make. In addition this libc assumes that the kernel
has a system calls that yours does not have. You should match your userland
with your kernel. I.e. either use the libc from -5 or upgrade everything.

christos
AGC
2012-02-11 20:37:45 UTC
Permalink
Post by Christos Zoulas
Yes, because these need a more recent set of system makefiles and perhaps a
more recent version of make. In addition this libc assumes that the kernel
has a system calls that yours does not have. You should match your userland
with your kernel. I.e. either use the libc from -5 or upgrade everything.
Ok, well the only place I could find the sources for libc were in
NetBSD-current. I've been browsing around the ftp site but I don't see
the sources that I need.

I originally went here:
/pub/NetBSD/NetBSD-current/tar_files/src/

And this had everything, bin, lib, and the rest.

But I can't find a similar tree anywhere else under NetBSD-5.1 (which is
what I have). The only thing I see are the sets. I must not be looking
in the right place but I'm not sure where the right place is.
Manuel Bouyer
2012-02-11 20:58:21 UTC
Permalink
Post by AGC
But I can't find a similar tree anywhere else under NetBSD-5.1
(which is what I have). The only thing I see are the sets. I must
not be looking in the right place but I'm not sure where the right
place is.
Under NetBSD-5.1 there is a source set.
What you want is in pub/NetBSD/NetBSD-5.1/source/sets/src.tgz
(and you probably need to extract the whole src.tgz to rebuild
libc; maybe you need syssrc too as there are header files in there
that may be needed).
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
Christos Zoulas
2012-02-11 21:04:21 UTC
Permalink
On Feb 11, 12:37pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/11/2012 06:57, Christos Zoulas wrote:
|
| > Yes, because these need a more recent set of system makefiles and perhaps a
| > more recent version of make. In addition this libc assumes that the kernel
| > has a system calls that yours does not have. You should match your userland
| > with your kernel. I.e. either use the libc from -5 or upgrade everything.
|
| Ok, well the only place I could find the sources for libc were in
| NetBSD-current. I've been browsing around the ftp site but I don't see
| the sources that I need.
|
| I originally went here:
| /pub/NetBSD/NetBSD-current/tar_files/src/
|
| And this had everything, bin, lib, and the rest.
|
| But I can't find a similar tree anywhere else under NetBSD-5.1 (which is
| what I have). The only thing I see are the sets. I must not be looking
| in the right place but I'm not sure where the right place is.

Get it from cvs:

cvs -d anoncvs.netbsd.org:/cvsroot checkout -r netbsd-5-1 src/lib/libc src/common
Or the whole thing:

cvs -d anoncvs.netbsd.org:/cvsroot checkout -r netbsd-5-1 src

christos
AGC
2012-02-12 00:34:41 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
|> Yes, because these need a more recent set of system makefiles and perhaps a
|> more recent version of make. In addition this libc assumes that the kernel
|> has a system calls that yours does not have. You should match your userland
|> with your kernel. I.e. either use the libc from -5 or upgrade everything.
|
| Ok, well the only place I could find the sources for libc were in
| NetBSD-current. I've been browsing around the ftp site but I don't see
| the sources that I need.
|
| /pub/NetBSD/NetBSD-current/tar_files/src/
|
| And this had everything, bin, lib, and the rest.
|
| But I can't find a similar tree anywhere else under NetBSD-5.1 (which is
| what I have). The only thing I see are the sets. I must not be looking
| in the right place but I'm not sure where the right place is.
cvs -d anoncvs.netbsd.org:/cvsroot checkout -r netbsd-5-1 src/lib/libc src/common
cvs -d anoncvs.netbsd.org:/cvsroot checkout -r netbsd-5-1 src
christos
2# make
make: "/mnt/src/lib/libc/compat/Makefile.inc" line 13: Cannot open /mnt/src/lib/libc/compat/arch//Makefile.inc
.: Can't open /usr/src/lib/i18n_module/shlib_version
make: "/mnt/src/lib/libc/citrus/Makefile.inc" line 9: warning: ". /usr/src/lib/i18n_module/shlib_version ; echo $major" returned non-zero status
make: "/mnt/src/lib/libc/gdtoa/Makefile.inc" line 20: Cannot open /mnt/src/lib/libc/arch//gdtoa/Makefile.inc
make: "/mnt/src/lib/libc/gen/Makefile.inc" line 51: Cannot open /mnt/src/lib/libc/arch//gen/Makefile.inc
make: "/mnt/src/lib/libc/net/Makefile.inc" line 42: Cannot open /mnt/src/lib/libc/arch//net/Makefile.inc
make: "/mnt/src/lib/libc/stdlib/Makefile.inc" line 35: Cannot open /mnt/src/lib/libc/arch//stdlib/Makefile.inc
make: "/mnt/src/lib/libc/string/Makefile.inc" line 33: Cannot open /mnt/src/lib/libc/arch//string/Makefile.inc
make: Fatal errors encountered -- cannot continue
make: stopped in /mnt/src/lib/libc
bash-4.2#
I don't know why it's looking specifically in /usr/src instead of the
relative directory. I do have that file present in
/mnt/src/lib/i18n_module/ so I just created a directory tree in /usr/src
which eliminated that particular error but all the rest still exist and
I'm not sure why it's missing the architecture. Somewhere I'm missing a
config file that defines the architecture.
Christos Zoulas
2012-02-12 00:36:44 UTC
Permalink
On Feb 11, 4:34pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| I don't know why it's looking specifically in /usr/src instead of the
| relative directory. I do have that file present in
| /mnt/src/lib/i18n_module/ so I just created a directory tree in /usr/src
| which eliminated that particular error but all the rest still exist and
| I'm not sure why it's missing the architecture. Somewhere I'm missing a
| config file that defines the architecture.

Do you have a complete 1.5 tree in /mnt/src?

christos
AGC
2012-02-12 21:09:18 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| I don't know why it's looking specifically in /usr/src instead of the
| relative directory. I do have that file present in
| /mnt/src/lib/i18n_module/ so I just created a directory tree in /usr/src
| which eliminated that particular error but all the rest still exist and
| I'm not sure why it's missing the architecture. Somewhere I'm missing a
| config file that defines the architecture.
Do you have a complete 1.5 tree in /mnt/src?
christos
I don't have the full tree at the moment. I've been working on it. It's
very slow to download and the process has failed twice with errors so I
was hoping to do it with a limited download instead of the full tree.

The last error was from CVS saying that a particular entry was not a
directory and the whole process aborted in the middle.
Christos Zoulas
2012-02-12 21:12:23 UTC
Permalink
On Feb 12, 1:09pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/11/2012 16:36, Christos Zoulas wrote:
| > On Feb 11, 4:34pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | I don't know why it's looking specifically in /usr/src instead of the
| > | relative directory. I do have that file present in
| > | /mnt/src/lib/i18n_module/ so I just created a directory tree in /usr/src
| > | which eliminated that particular error but all the rest still exist and
| > | I'm not sure why it's missing the architecture. Somewhere I'm missing a
| > | config file that defines the architecture.
| >
| > Do you have a complete 1.5 tree in /mnt/src?
| >
| > christos
| >
| >
|
| I don't have the full tree at the moment. I've been working on it. It's
| very slow to download and the process has failed twice with errors so I
| was hoping to do it with a limited download instead of the full tree.
|
| The last error was from CVS saying that a particular entry was not a
| directory and the whole process aborted in the middle.

just remove the entry. CVS will pick up from where it left.

christos
AGC
2012-02-14 04:54:57 UTC
Permalink
I downloaded the whole tree by CVS but still nothing:

# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127

Stop.
make: stopped in /mnt/src/lib/libc
AGC
2012-02-14 05:46:59 UTC
Permalink
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
AGC
2012-02-14 19:54:42 UTC
Permalink
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
USETOOLS=no got it compiling up until:

/mnt/src/lib/libc/nameser/ns_name.c(727): syntax error 'nname' [249]
/mnt/src/lib/libc/nameser/ns_name.c(727): warning: argument type
defaults to 'int': ns_nname_ct [32]
/mnt/src/lib/libc/nameser/ns_name.c(728): syntax error 'orig' [249]
/mnt/src/lib/libc/nameser/ns_name.c(731): namesiz undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(731): nname undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(731): cannot dereference non-pointer
type [96]
/mnt/src/lib/libc/nameser/ns_name.c(732): warning: n may be used before
set [158]
/mnt/src/lib/libc/nameser/ns_name.c(743): orig undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(727): warning: argument ns_nname_ct
unused in function ns_name_length [231]
/mnt/src/lib/libc/nameser/ns_name.c(749): syntax error 'a' [249]
/mnt/src/lib/libc/nameser/ns_name.c(749): redeclaration of formal
parameter ns_nname_ct [21]
/mnt/src/lib/libc/nameser/ns_name.c(749): warning: argument type
defaults to 'int': ns_nname_ct [32]
/mnt/src/lib/libc/nameser/ns_name.c(749): warning: argument type
defaults to 'int': ns_nname_ct [32]
/mnt/src/lib/libc/nameser/ns_name.c(750): syntax error 'ae' [249]
/mnt/src/lib/libc/nameser/ns_name.c(753): a undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(753): cannot dereference non-pointer
type [96]
/mnt/src/lib/libc/nameser/ns_name.c(753): b undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(753): cannot dereference non-pointer
type [96]
/mnt/src/lib/libc/nameser/ns_name.c(754): warning: ac may be used before
set [158]
/mnt/src/lib/libc/nameser/ns_name.c(754): warning: bc may be used before
set [158]
/mnt/src/lib/libc/nameser/ns_name.c(758): ae undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(758): be undefined [99]
/mnt/src/lib/libc/nameser/ns_name.c(749): warning: argument ns_nname_ct
unused in function ns_name_eq [231]
/mnt/src/lib/libc/nameser/ns_name.c(749): warning: argument ns_nname_ct
unused in function ns_name_eq [231]
/mnt/src/lib/libc/nameser/ns_name.c(774): syntax error 'a' [249]
/mnt/src/lib/libc/nameser/ns_name.c(774): cannot recover from previous
errors [224]
*** Error code 1
Christos Zoulas
2012-02-14 22:03:15 UTC
Permalink
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
add NOLINT=yes

christos
Alexander Carver
2012-02-14 22:43:46 UTC
Permalink
Post by Christos Zoulas
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
add NOLINT=yes
christos
Still errors out but in a slightly different place:

cc1: warnings being treated as errors
/mnt/src/lib/libc/nameser/ns_name.c: In function '__ns_name_pton':
/mnt/src/lib/libc/nameser/ns_name.c:209: warning: implicit declaration
of function 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c: At top level:
/mnt/src/lib/libc/nameser/ns_name.c:225: warning: no previous prototype
for 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c: In function '__ns_name_unpack':
/mnt/src/lib/libc/nameser/ns_name.c:414: warning: implicit declaration
of function 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c: At top level:
/mnt/src/lib/libc/nameser/ns_name.c:428: warning: no previous prototype
for 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:727: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:749: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:774: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:797: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:847: error: expected ')' before 'nname'
*** Error code 1
AGC
2012-02-14 22:44:22 UTC
Permalink
Post by Christos Zoulas
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
add NOLINT=yes
christos
Still errors out but in a slightly different place:

cc1: warnings being treated as errors
/mnt/src/lib/libc/nameser/ns_name.c: In function '__ns_name_pton':
/mnt/src/lib/libc/nameser/ns_name.c:209: warning: implicit declaration
of function 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c: At top level:
/mnt/src/lib/libc/nameser/ns_name.c:225: warning: no previous prototype
for 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c: In function '__ns_name_unpack':
/mnt/src/lib/libc/nameser/ns_name.c:414: warning: implicit declaration
of function 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c: At top level:
/mnt/src/lib/libc/nameser/ns_name.c:428: warning: no previous prototype
for 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:727: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:749: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:774: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:797: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:847: error: expected ')' before 'nname'
*** Error code 1
Christos Zoulas
2012-02-15 02:05:23 UTC
Permalink
Post by Alexander Carver
Post by Christos Zoulas
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
add NOLINT=yes
christos
cc1: warnings being treated as errors
/mnt/src/lib/libc/nameser/ns_name.c:209: warning: implicit declaration
of function 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c:225: warning: no previous prototype
for 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c:414: warning: implicit declaration
of function 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:428: warning: no previous prototype
for 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:727: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:749: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:774: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:797: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:847: error: expected ')' before 'nname'
the include files seem to be stale. update src/include and make includes there.

christos
Alexander Carver
2012-02-16 20:37:10 UTC
Permalink
Post by Christos Zoulas
Post by Alexander Carver
Post by Christos Zoulas
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
a) cd /mnt/src&& ./build.sh tools
and then run make
or
b) make USETOOLS=no
Thanks, I never found that command line option online. Seems to be
compiling now, we'll see if it goes through.
add NOLINT=yes
christos
cc1: warnings being treated as errors
/mnt/src/lib/libc/nameser/ns_name.c:209: warning: implicit declaration
of function 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c:225: warning: no previous prototype
for 'ns_name_pton2'
/mnt/src/lib/libc/nameser/ns_name.c:414: warning: implicit declaration
of function 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:428: warning: no previous prototype
for 'ns_name_unpack2'
/mnt/src/lib/libc/nameser/ns_name.c:727: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:749: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:774: error: expected ')' before 'a'
/mnt/src/lib/libc/nameser/ns_name.c:797: error: expected ')' before 'nname'
/mnt/src/lib/libc/nameser/ns_name.c:847: error: expected ')' before 'nname'
the include files seem to be stale. update src/include and make includes there.
christos
Did that, now it gets further along but still stops. I keep restarting
make and it seems to move on some more each time.

make: don't know how to make /usr/share/tmac/andoc.tmac. Stop
Christos Zoulas
2012-02-16 20:41:55 UTC
Permalink
On Feb 16, 12:37pm, ***@acarver.net (Alexander Carver) wrote:
-- Subject: Re: ntpd wedged by libc?

| make: don't know how to make /usr/share/tmac/andoc.tmac. Stop


Do you have the text.tgz set installed?

christos
A C
2012-02-16 20:42:48 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| make: don't know how to make /usr/share/tmac/andoc.tmac. Stop
Do you have the text.tgz set installed?
No, I don't think I installed that when I first installed the system.
AGC
2012-02-16 21:39:14 UTC
Permalink
Post by A C
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| make: don't know how to make /usr/share/tmac/andoc.tmac. Stop
Do you have the text.tgz set installed?
No, I don't think I installed that when I first installed the system.
Installed and make is now progressing. Looks like it wanted to build
the man pages.
Christos Zoulas
2012-02-16 22:03:25 UTC
Permalink
On Feb 16, 12:42pm, agcarver+***@acarver.net (A C) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/16/2012 12:41, Christos Zoulas wrote:
| > On Feb 16, 12:37pm, ***@acarver.net (Alexander Carver) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | make: don't know how to make /usr/share/tmac/andoc.tmac. Stop
| >
| >
| > Do you have the text.tgz set installed?
|
| No, I don't think I installed that when I first installed the system.

Install it...

christos
AGC
2012-02-23 05:55:47 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|> -- Subject: Re: ntpd wedged by libc?
|>
|> | make: don't know how to make /usr/share/tmac/andoc.tmac. Stop
|>
|>
|> Do you have the text.tgz set installed?
|
| No, I don't think I installed that when I first installed the system.
Install it...
Finally got it all installed. That was an interesting exercise but
useful. I ended up having to NFS mount another machine so I would have
enough disk space to hold all the sources. The internal drive on the
IPX is only 1.2 GB so it doesn't have much free space.

Looks like the new library took (I hope). Nothing is complaining so I
think it's ok.
Dave Hart
2012-02-23 07:41:18 UTC
Permalink
Finally got it all installed.  That was an interesting exercise but useful.
 I ended up having to NFS mount another machine so I would have enough disk
space to hold all the sources.  The internal drive on the IPX is only 1.2 GB
so it doesn't have much free space.
Looks like the new library took (I hope).  Nothing is complaining so I think
it's ok.
You can be more confident if you grab another recent ntp-dev tarball
(p259 is nice) and configure without --enable-c99-snprintf, build and
install it, and subject it to the ntpq -p every 5 seconds regime.

Cheers,
Dave Hart
AGC
2012-02-26 17:30:57 UTC
Permalink
Post by Dave Hart
Finally got it all installed. That was an interesting exercise but useful.
I ended up having to NFS mount another machine so I would have enough disk
space to hold all the sources. The internal drive on the IPX is only 1.2 GB
so it doesn't have much free space.
Looks like the new library took (I hope). Nothing is complaining so I think
it's ok.
You can be more confident if you grab another recent ntp-dev tarball
(p259 is nice) and configure without --enable-c99-snprintf, build and
install it, and subject it to the ntpq -p every 5 seconds regime.
Ok, it didn't work using the new version of libc:

(gdb) bt
#0 0x103b5574 in __multadd_D2A () from /usr/lib/libc.so.12
#1 0x103a7b90 in __dtoa () from /usr/lib/libc.so.12
#2 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#3 0x103133ac in snprintf () from /usr/lib/libc.so.12
#4 0x0005d310 in atom_timer (unit=0, peer=0xafd28) at refclock_atom.c:202
#5 0x0003a990 in refclock_timer (p=0xafd28) at ntp_refclock.c:273
#6 0x00041950 in timer () at ntp_timer.c:300
#7 0x00023320 in ntpdmain (argc=0, argv=0xefffe86c) at ntpd.c:1026
#8 0x00013880 in ___start ()
#9 0x000137b8 in _start ()
(gdb)


So libc is still broken.
Christos Zoulas
2012-02-26 23:15:28 UTC
Permalink
Post by AGC
Post by Dave Hart
Finally got it all installed. That was an interesting exercise but useful.
I ended up having to NFS mount another machine so I would have enough disk
space to hold all the sources. The internal drive on the IPX is only 1.2 GB
so it doesn't have much free space.
Looks like the new library took (I hope). Nothing is complaining so I think
it's ok.
You can be more confident if you grab another recent ntp-dev tarball
(p259 is nice) and configure without --enable-c99-snprintf, build and
install it, and subject it to the ntpq -p every 5 seconds regime.
(gdb) bt
#0 0x103b5574 in __multadd_D2A () from /usr/lib/libc.so.12
#1 0x103a7b90 in __dtoa () from /usr/lib/libc.so.12
#2 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#3 0x103133ac in snprintf () from /usr/lib/libc.so.12
#4 0x0005d310 in atom_timer (unit=0, peer=0xafd28) at refclock_atom.c:202
#5 0x0003a990 in refclock_timer (p=0xafd28) at ntp_refclock.c:273
#6 0x00041950 in timer () at ntp_timer.c:300
#7 0x00023320 in ntpdmain (argc=0, argv=0xefffe86c) at ntpd.c:1026
#8 0x00013880 in ___start ()
#9 0x000137b8 in _start ()
(gdb)
Are you sure you are using the new library?

christos
AGC
2012-02-26 23:26:19 UTC
Permalink
Post by Christos Zoulas
Post by AGC
Post by Dave Hart
Finally got it all installed. That was an interesting exercise but useful.
I ended up having to NFS mount another machine so I would have enough disk
space to hold all the sources. The internal drive on the IPX is only 1.2 GB
so it doesn't have much free space.
Looks like the new library took (I hope). Nothing is complaining so I think
it's ok.
You can be more confident if you grab another recent ntp-dev tarball
(p259 is nice) and configure without --enable-c99-snprintf, build and
install it, and subject it to the ntpq -p every 5 seconds regime.
(gdb) bt
#0 0x103b5574 in __multadd_D2A () from /usr/lib/libc.so.12
#1 0x103a7b90 in __dtoa () from /usr/lib/libc.so.12
#2 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#3 0x103133ac in snprintf () from /usr/lib/libc.so.12
#4 0x0005d310 in atom_timer (unit=0, peer=0xafd28) at refclock_atom.c:202
#5 0x0003a990 in refclock_timer (p=0xafd28) at ntp_refclock.c:273
#6 0x00041950 in timer () at ntp_timer.c:300
#7 0x00023320 in ntpdmain (argc=0, argv=0xefffe86c) at ntpd.c:1026
#8 0x00013880 in ___start ()
#9 0x000137b8 in _start ()
(gdb)
Are you sure you are using the new library?
I'm fairly certain. I did a make install after the libc compile,
rebooted and then recompiled a new version of ntpd.
Christos Zoulas
2012-02-27 02:30:33 UTC
Permalink
On Feb 26, 3:26pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| >> (gdb)
| >
| > Are you sure you are using the new library?
|
| I'm fairly certain. I did a make install after the libc compile,
| rebooted and then recompiled a new version of ntpd.

You don't need to reboot/recompile. Unfortunately we did not put rcsid's
in the gdtoa code so it is not easy to check that you have the right version.
I am pretty sure that the new code does not have the bug, so perhaps the
new libc does not have the right version of misc.c?

christos
AGC
2012-02-27 03:44:04 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>> (gdb)
|>
|> Are you sure you are using the new library?
|
| I'm fairly certain. I did a make install after the libc compile,
| rebooted and then recompiled a new version of ntpd.
You don't need to reboot/recompile. Unfortunately we did not put rcsid's
in the gdtoa code so it is not easy to check that you have the right version.
I am pretty sure that the new code does not have the bug, so perhaps the
new libc does not have the right version of misc.c?
I see the extra memory frees in the gdtoa.c file that were present in
the patch you supplied so I know the code made it into libc. What do I
need to change in misc.c then?
Christos Zoulas
2012-02-27 13:51:41 UTC
Permalink
On Feb 26, 7:44pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 2/26/2012 18:30, Christos Zoulas wrote:
| > On Feb 26, 3:26pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > |>> (gdb)
| > |>
| > |> Are you sure you are using the new library?
| > |
| > | I'm fairly certain. I did a make install after the libc compile,
| > | rebooted and then recompiled a new version of ntpd.
| >
| > You don't need to reboot/recompile. Unfortunately we did not put rcsid's
| > in the gdtoa code so it is not easy to check that you have the right version.
| > I am pretty sure that the new code does not have the bug, so perhaps the
| > new libc does not have the right version of misc.c?
|
| I see the extra memory frees in the gdtoa.c file that were present in
| the patch you supplied so I know the code made it into libc. What do I
| need to change in misc.c then?

Nothing, just compare the misc.c with the one in head and make sure they
are the same. You could also add an __RCSID() in your misc.c so that
ident will find it.

christos
AGC
2012-03-02 05:59:26 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|> -- Subject: Re: ntpd wedged by libc?
|>
|> |>> (gdb)
|> |>
|> |> Are you sure you are using the new library?
|> |
|> | I'm fairly certain. I did a make install after the libc compile,
|> | rebooted and then recompiled a new version of ntpd.
|>
|> You don't need to reboot/recompile. Unfortunately we did not put rcsid's
|> in the gdtoa code so it is not easy to check that you have the right version.
|> I am pretty sure that the new code does not have the bug, so perhaps the
|> new libc does not have the right version of misc.c?
|
| I see the extra memory frees in the gdtoa.c file that were present in
| the patch you supplied so I know the code made it into libc. What do I
| need to change in misc.c then?
Nothing, just compare the misc.c with the one in head and make sure they
are the same. You could also add an __RCSID() in your misc.c so that
ident will find it.
The copy I have has a banner reading:

/* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */

The copy I obtained from
ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
(unless this isn't head)

Has a header:
/* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Christos Zoulas
2012-03-02 13:30:09 UTC
Permalink
On Mar 1, 9:59pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| The copy I have has a banner reading:
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| Has a header:
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */

Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
test leaks for you.

christos
AGC
2012-03-03 02:34:23 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
test leaks for you.
I don't see t_printf.c anywhere in
ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/
Dave Hart
2012-03-03 03:56:36 UTC
Permalink
Post by AGC
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
test leaks for you.
I don't see t_printf.c anywhere in
ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/
http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/tests/lib/libc/stdio/

Cheers,
Dave Hart
AGC
2012-03-03 04:15:09 UTC
Permalink
Post by Dave Hart
Post by AGC
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
test leaks for you.
I don't see t_printf.c anywhere in
ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/
http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/tests/lib/libc/stdio/
Thank you, I completely missed the "tests" directory in the path.
Christos Zoulas
2012-03-03 16:47:20 UTC
Permalink
On Mar 2, 6:34pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/2/2012 05:30, Christos Zoulas wrote:
| > On Mar 1, 9:59pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | The copy I have has a banner reading:
| > |
| > | /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
| > |
| > | The copy I obtained from
| > | ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| > | (unless this isn't head)
| > |
| > | Has a header:
| > | /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
| >
| > Yup, that's the latest.
| > Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
| > test leaks for you.
|
| I don't see t_printf.c anywhere in
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/

ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/tests/lib/libc/stdio/

christos
AGC
2012-03-03 04:38:16 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
test leaks for you.
# gcc -o t_printf t_printf.c
t_printf.c: In function 'atfu_snprintf_float_body':
t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode
Alexander Carver
2012-03-03 05:19:53 UTC
Permalink
Post by AGC
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the
snprintf_float
test leaks for you.
# gcc -o t_printf t_printf.c
t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode
I also tried with the makefile:

# make USETOOLS=never
# compile stdio/t_printf.o
cc -O2 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith
-Wno-sign-compare -Wno-traditional -Wa,--fatal-warnings -Wreturn-type
-Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra
-Wno-unused-parameter -std=gnu99 -Wno-missing-noreturn -Werror -c
-Wno-format-nonliteral t_printf.c
cc1: warnings being treated as errors
t_printf.c: In function 'atfu_snprintf_dotzero_body':
t_printf.c:52: warning: implicit declaration of function 'ATF_REQUIRE_STREQ'
*** Error code 1

Stop.
make: stopped in /mnt/src/tests/lib/libc/stdio
AGC
2012-03-04 03:43:46 UTC
Permalink
Post by Alexander Carver
Post by AGC
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
| /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|
| The copy I obtained from
| ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| (unless this isn't head)
|
| /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
Yup, that's the latest.
Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the
snprintf_float
test leaks for you.
# gcc -o t_printf t_printf.c
t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode
# make USETOOLS=never
# compile stdio/t_printf.o
cc -O2 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith
-Wno-sign-compare -Wno-traditional -Wa,--fatal-warnings -Wreturn-type
-Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra
-Wno-unused-parameter -std=gnu99 -Wno-missing-noreturn -Werror -c
-Wno-format-nonliteral t_printf.c
cc1: warnings being treated as errors
t_printf.c:52: warning: implicit declaration of function
'ATF_REQUIRE_STREQ'
*** Error code 1
Stop.
make: stopped in /mnt/src/tests/lib/libc/stdio
I did search for the aft-c.h file mentioned as an include in t_printf.c
and it does exist on the system:

# find . -name atf-c.h
/mnt/src/dist/atf/atf-c.h

So I don't understand why the compile is failing.
Martin Husemann
2012-03-04 10:26:17 UTC
Permalink
Post by AGC
I did search for the aft-c.h file mentioned as an include in t_printf.c
# find . -name atf-c.h
/mnt/src/dist/atf/atf-c.h
So I don't understand why the compile is failing.
The ATF version in netbsd-5 is significantly older than the one in -current.
You can probably easily add the missing pieces, like:

#define ATF_REQUIRE_STREQ(A, B) ATF_CHECK(strcmp((A),(B))==0)


Martin
Christos Zoulas
2012-03-04 21:02:16 UTC
Permalink
On Mar 2, 9:19pm, ***@acarver.net (Alexander Carver) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/2/2012 20:38, AGC wrote:
| > On 3/2/2012 05:30, Christos Zoulas wrote:
| >> On Mar 1, 9:59pm, agcarver+***@acarver.net (AGC) wrote:
| >> -- Subject: Re: ntpd wedged by libc?
| >>
| >> | The copy I have has a banner reading:
| >> |
| >> | /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
| >> |
| >> | The copy I obtained from
| >> | ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| >> | (unless this isn't head)
| >> |
| >> | Has a header:
| >> | /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
| >>
| >> Yup, that's the latest.
| >> Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the
| >> snprintf_float
| >> test leaks for you.
| >
| > # gcc -o t_printf t_printf.c
| > t_printf.c: In function 'atfu_snprintf_float_body':
| > t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode
| >
| >
| >
| I also tried with the makefile:
|
| # make USETOOLS=never
| # compile stdio/t_printf.o
| cc -O2 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith
| -Wno-sign-compare -Wno-traditional -Wa,--fatal-warnings -Wreturn-type
| -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra
| -Wno-unused-parameter -std=gnu99 -Wno-missing-noreturn -Werror -c
| -Wno-format-nonliteral t_printf.c
| cc1: warnings being treated as errors
| t_printf.c: In function 'atfu_snprintf_dotzero_body':
| t_printf.c:52: warning: implicit declaration of function 'ATF_REQUIRE_STREQ'

I guess that requires newer ATF_...

#define ATF_REQUIRE_STREQ(a, b) ATF_REQUIRE(strcmp(a, b) == 0)

christos
AGC
2012-03-05 00:57:08 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>> -- Subject: Re: ntpd wedged by libc?
|>>
|>> |
|>> | /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
|>> |
|>> | The copy I obtained from
|>> | ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
|>> | (unless this isn't head)
|>> |
|>> | /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
|>>
|>> Yup, that's the latest.
|>> Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the
|>> snprintf_float
|>> test leaks for you.
|>
|> # gcc -o t_printf t_printf.c
|> t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode
|>
|>
|>
|
| # make USETOOLS=never
| # compile stdio/t_printf.o
| cc -O2 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith
| -Wno-sign-compare -Wno-traditional -Wa,--fatal-warnings -Wreturn-type
| -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra
| -Wno-unused-parameter -std=gnu99 -Wno-missing-noreturn -Werror -c
| -Wno-format-nonliteral t_printf.c
| cc1: warnings being treated as errors
| t_printf.c:52: warning: implicit declaration of function 'ATF_REQUIRE_STREQ'
I guess that requires newer ATF_...
#define ATF_REQUIRE_STREQ(a, b) ATF_REQUIRE(strcmp(a, b) == 0)
Ok, the first suggested define worked to allow it to compile:

#define ATF_REQUIRE_STREQ(A, B) ATF_CHECK(strcmp((A), (B)) == 0 )

I get this result:
# ./t_printf
Content-Type: application/X-atf-tcs; version="1"

tcs-count: 6
tc-start: snprintf_dotzero
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_dotzero, passed
tc-start: snprintf_posarg
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg, passed
tc-start: snprintf_posarg_width
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg_width, passed
tc-start: snprintf_posarg_error
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg_error, failed, Test case did not exit cleanly;
state was 138
tc-start: snprintf_float
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_float, failed, Test case timed out after 300 seconds
tc-start: sprintf_zeropad
tc-end: sprintf_zeropad, failed, Line 157: strcmp((str),(" nan"))
== 0 not met
Christos Zoulas
2012-03-05 01:27:45 UTC
Permalink
On Mar 4, 4:57pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| Ok, the first suggested define worked to allow it to compile:
|
| #define ATF_REQUIRE_STREQ(A, B) ATF_CHECK(strcmp((A), (B)) == 0 )
|
| I get this result:
| # ./t_printf
| Content-Type: application/X-atf-tcs; version="1"
|
| tcs-count: 6
| tc-start: snprintf_dotzero
| __atf_tc_separator__
| __atf_tc_separator__
| tc-end: snprintf_dotzero, passed
| tc-start: snprintf_posarg
| __atf_tc_separator__
| __atf_tc_separator__
| tc-end: snprintf_posarg, passed
| tc-start: snprintf_posarg_width
| __atf_tc_separator__
| __atf_tc_separator__
| tc-end: snprintf_posarg_width, passed
| tc-start: snprintf_posarg_error
| __atf_tc_separator__
| __atf_tc_separator__
| tc-end: snprintf_posarg_error, failed, Test case did not exit cleanly;
| state was 138
| tc-start: snprintf_float
| __atf_tc_separator__
| __atf_tc_separator__
| tc-end: snprintf_float, failed, Test case timed out after 300 seconds

Add:
tf_tc_set_md_var(tc, "timeout", "3000");

to the setup of that test. I guess it did not finish on time.

christos
AGC
2012-03-06 04:03:29 UTC
Permalink
Post by Christos Zoulas
tf_tc_set_md_var(tc, "timeout", "3000");
to the setup of that test. I guess it did not finish on time.
I had to bump it up to 30,000 but it finally did finish:

# ./t_printf
Content-Type: application/X-atf-tcs; version="1"

tcs-count: 6
tc-start: snprintf_dotzero
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_dotzero, passed
tc-start: snprintf_posarg
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg, passed
tc-start: snprintf_posarg_width
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg_width, passed
tc-start: snprintf_posarg_error
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_posarg_error, failed, Test case did not exit cleanly;
state was 138
tc-start: snprintf_float
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_float, passed
tc-start: sprintf_zeropad
tc-end: sprintf_zeropad, failed, Line 157: strcmp((str),(" nan"))
== 0 not met
Christos Zoulas
2012-03-06 14:22:59 UTC
Permalink
On Mar 5, 8:03pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/4/2012 17:27, Christos Zoulas wrote:
|
| > Add:
| > tf_tc_set_md_var(tc, "timeout", "3000");
| >
| > to the setup of that test. I guess it did not finish on time.
|
| I had to bump it up to 30,000 but it finally did finish:

Then I don't think it is leaking... You can try with the old libc,
and you'll see it will run out of memory.

christos
Dave Hart
2012-03-06 15:01:58 UTC
Permalink
Post by Christos Zoulas
Then I don't think it is leaking... You can try with the old libc,
and you'll see it will run out of memory.
The problem has evolved. At first, ntpd stopped responding due to out
of memory due to a leak triggered by lots of snprintf with floating
point. With the leak so identified now fixed, it's still ntpd is now
reported to be "wedging" (I assume meaning spinning using lots of CPU
and not responding to network traffic) and it's still apparently
related to snprintf of floating points. The opening message of this
thread has a stack trace which I assume came from attaching a debugger
to the spinning ntpd:

======
Seems I'm still having issues with libc on 5.1/sparc specifically with
ntpd wedging when doing math:

#0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
#1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
#2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
#3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
#5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
#7 0x000256f4 in ctl_putdblf (tag=0x87d79 "", fmt=0x88458 "%.3f",
d=4.5623779296875)
at ntp_control.c:1431
======

There have been over 50 messages in the thread, so I think we can all
be forgiven forgetting a detail or two along the way, but I don't
think anyone has suggested the original leak bug hasn't been fixed.
Rather, it seems there is still some sort of problem on "5.1" (not
-current, clearly) on sparc with ntpd being polled every few seconds
by ntpq triggering a hang snprintf'ing with floating point.

The stack trace looks very similar to the first go-around. If
accurate, it suggests the same code still has issues that ntpd's abuse
tickles but t_printf.c doesn't.

Cheers,
Dave Hart
Christos Zoulas
2012-03-06 16:18:40 UTC
Permalink
On Mar 6, 3:01pm, ***@ntp.org (Dave Hart) wrote:
-- Subject: Re: ntpd wedged by libc?

| On Tue, Mar 6, 2012 at 14:22, Christos Zoulas <***@zoulas.com> wrote:
| > On Mar 5, =A08:03pm, agcarver+***@acarver.net (AGC) wrote:
| > | I had to bump it up to 30,000 but it finally did finish:
| >
| > Then I don't think it is leaking... You can try with the old libc,
| > and you'll see it will run out of memory.
|
| The problem has evolved. At first, ntpd stopped responding due to out
| of memory due to a leak triggered by lots of snprintf with floating
| point. With the leak so identified now fixed, it's still ntpd is now
| reported to be "wedging" (I assume meaning spinning using lots of CPU
| and not responding to network traffic) and it's still apparently
| related to snprintf of floating points. The opening message of this
| thread has a stack trace which I assume came from attaching a debugger
| to the spinning ntpd:
|
| =3D=3D=3D=3D=3D=3D
| Seems I'm still having issues with libc on 5.1/sparc specifically with
| ntpd wedging when doing math:
|
| #0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12
| #1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12
| #2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12
| #3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
| #4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12
| #5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
| #6 0x103330c4 in snprintf () from /usr/lib/libc.so.12
| #7 0x000256f4 in ctl_putdblf (tag=3D0x87d79 "", fmt=3D0x88458 "%.3f",
| d=3D4.5623779296875)
| at ntp_control.c:1431
| =3D=3D=3D=3D=3D=3D
|
| There have been over 50 messages in the thread, so I think we can all
| be forgiven forgetting a detail or two along the way, but I don't
| think anyone has suggested the original leak bug hasn't been fixed.
| Rather, it seems there is still some sort of problem on "5.1" (not
| -current, clearly) on sparc with ntpd being polled every few seconds
| by ntpq triggering a hang snprintf'ing with floating point.
|
| The stack trace looks very similar to the first go-around. If
| accurate, it suggests the same code still has issues that ntpd's abuse
| tickles but t_printf.c doesn't.

Sure, let's change the test to be closer to the ntp one, let's make the
format %.3f for example. The way I tracked it down initially was by
instrumenting all malloc/free's in the dtoa code...

christos
AGC
2012-03-06 18:41:44 UTC
Permalink
Post by Christos Zoulas
| There have been over 50 messages in the thread, so I think we can all
| be forgiven forgetting a detail or two along the way, but I don't
| think anyone has suggested the original leak bug hasn't been fixed.
| Rather, it seems there is still some sort of problem on "5.1" (not
| -current, clearly) on sparc with ntpd being polled every few seconds
| by ntpq triggering a hang snprintf'ing with floating point.
|
| The stack trace looks very similar to the first go-around. If
| accurate, it suggests the same code still has issues that ntpd's abuse
| tickles but t_printf.c doesn't.
Sure, let's change the test to be closer to the ntp one, let's make the
format %.3f for example. The way I tracked it down initially was by
instrumenting all malloc/free's in the dtoa code...
I changed the format to %.6f since there are a few calls within ntpd
that are of that precision or more. I'll recompile and test.
AGC
2012-03-07 08:08:32 UTC
Permalink
Post by AGC
Post by Christos Zoulas
| There have been over 50 messages in the thread, so I think we can all
| be forgiven forgetting a detail or two along the way, but I don't
| think anyone has suggested the original leak bug hasn't been fixed.
| Rather, it seems there is still some sort of problem on "5.1" (not
| -current, clearly) on sparc with ntpd being polled every few seconds
| by ntpq triggering a hang snprintf'ing with floating point.
|
| The stack trace looks very similar to the first go-around. If
| accurate, it suggests the same code still has issues that ntpd's abuse
| tickles but t_printf.c doesn't.
Sure, let's change the test to be closer to the ntp one, let's make the
format %.3f for example. The way I tracked it down initially was by
instrumenting all malloc/free's in the dtoa code...
I changed the format to %.6f since there are a few calls within ntpd
that are of that precision or more. I'll recompile and test.
I ran the test with both %.6f and stepped it up even further all the way
to %.10f with no ill effects. The test passed all the extra precision
formats:

tc-start: snprintf_float
__atf_tc_separator__
__atf_tc_separator__
tc-end: snprintf_float, passed
Christos Zoulas
2012-03-07 12:08:47 UTC
Permalink
On Mar 7, 12:08am, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| I ran the test with both %.6f and stepped it up even further all the way
| to %.10f with no ill effects. The test passed all the extra precision
| formats:

Comment out the sprintf in ntpd or replace it with something else and see
if it survives...

christos
Dave Hart
2012-03-07 17:14:07 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| I ran the test with both %.6f and stepped it up even further all the way
| to %.10f with no ill effects.  The test passed all the extra precision
Comment out the sprintf in ntpd or replace it with something else and see
if it survives...
I'm pretty sure it will. Before the libc repair, AGC found he could
work around the problem by configuring ntpd 4.2.7 with
--enable-C99-snprintf, which forces use of a handrolled
printf/snprintf/vsnprintf replacement package otherwise used only if
the default snprintf is not C99-compliant. That implementation does
not use dtoa() and he found ntpd survived indefinitely.

The key difference between ntpd's use of snprintf and t_printf.c may
be simply the number of calls over time from the same process. AGC is
polling ntpd every 5 seconds with ntpq -p, which triggers about 7
floating point snprintf()s per peer (line of the ntpq -p peers
billboard table). I believe all of those are using "%.3f".

You might be able to expose the bug with t_printf.c by changing it to
generate and snprintf thousands of random floating point numbers.

AGC, approximately how long does it take for ntpd to get wedged with
the latest libc? And how many peers are there in your ntpq -p output?

Cheers,
Dave Hart
AGC
2012-03-07 20:58:50 UTC
Permalink
Post by Dave Hart
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| I ran the test with both %.6f and stepped it up even further all the way
| to %.10f with no ill effects. The test passed all the extra precision
Comment out the sprintf in ntpd or replace it with something else and see
if it survives...
I'm pretty sure it will. Before the libc repair, AGC found he could
work around the problem by configuring ntpd 4.2.7 with
--enable-C99-snprintf, which forces use of a handrolled
printf/snprintf/vsnprintf replacement package otherwise used only if
the default snprintf is not C99-compliant. That implementation does
not use dtoa() and he found ntpd survived indefinitely.
The key difference between ntpd's use of snprintf and t_printf.c may
be simply the number of calls over time from the same process. AGC is
polling ntpd every 5 seconds with ntpq -p, which triggers about 7
floating point snprintf()s per peer (line of the ntpq -p peers
billboard table). I believe all of those are using "%.3f".
You might be able to expose the bug with t_printf.c by changing it to
generate and snprintf thousands of random floating point numbers.
AGC, approximately how long does it take for ntpd to get wedged with
the latest libc? And how many peers are there in your ntpq -p output?
The wedge time has varied. Sometimes it happened within a few hours of
starting ntpd and other times it could take several days. I could never
really predict when it would fail.

There are seven peers total listed in the billboard.

What about the log and statistics files? I believe they're also using
various printf() calls, too, yes?
Dave Hart
2012-03-07 21:57:31 UTC
Permalink
Post by Dave Hart
AGC, approximately how long does it take for ntpd to get wedged with
the latest libc?  And how many peers are there in your ntpq -p output?
The wedge time has varied.  Sometimes it happened within a few hours of
starting ntpd and other times it could take several days.  I could never
really predict when it would fail.
You can probably speed it by pounding harder with ntpq. Use ntpq's -n
option to take out any DNS-related delays, and sleep less between
queries.
There are seven peers total listed in the billboard.
So about 50 ntpq-related %.3f snprintf() calls on the order of every 5 seconds.
What about the log and statistics files?  I believe they're also using
various printf() calls, too, yes?
Assuming you're not running ntpd interactively with -D or -d options
for debug trace output, clockstats, peerstats and loopstats would be
the next most frequent users of snprintf with floating point, but
that's a handful of floating-point-to-text conversions at the rate of
once per poll, or as often as 8 seconds for refclocks. All the traces
you've provided have come through ntp_control.c indicating ntpq-style
NTP mode 6 queries triggered the failure.

Was ntpd using any CPU when it wedged?

Cheers,
Dave Hart
AGC
2012-03-07 22:29:45 UTC
Permalink
Post by Dave Hart
Post by AGC
Post by Dave Hart
AGC, approximately how long does it take for ntpd to get wedged with
the latest libc? And how many peers are there in your ntpq -p output?
The wedge time has varied. Sometimes it happened within a few hours of
starting ntpd and other times it could take several days. I could never
really predict when it would fail.
You can probably speed it by pounding harder with ntpq. Use ntpq's -n
option to take out any DNS-related delays, and sleep less between
queries.
Post by AGC
There are seven peers total listed in the billboard.
So about 50 ntpq-related %.3f snprintf() calls on the order of every 5 seconds.
Post by AGC
What about the log and statistics files? I believe they're also using
various printf() calls, too, yes?
Assuming you're not running ntpd interactively with -D or -d options
for debug trace output, clockstats, peerstats and loopstats would be
the next most frequent users of snprintf with floating point, but
that's a handful of floating-point-to-text conversions at the rate of
once per poll, or as often as 8 seconds for refclocks. All the traces
you've provided have come through ntp_control.c indicating ntpq-style
NTP mode 6 queries triggered the failure.
Yes, I was specifically thinking the general log file (no debugging but
general messages like the fuzz errors, popcorn, etc.) and the peers,
clockstats, loops and sysstats files (I'm recording all of those).
Post by Dave Hart
Was ntpd using any CPU when it wedged?
Yes, when it wedges it uses upwards of 80-90% CPU according to top but
is otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP.
I attach gdb to it while it's spinning the CPU that's where I get the
stack trace. Interestingly a simple SIGTERM will break the loop and
ntpd will actually close out normally -- it issues the exit messages in
the logs about releasing kernel discipline and then message "exit on
signal 15" (or whatever the number is for a kill command with no signal
flag, I don't recall).
David Brownlee
2012-03-09 17:02:46 UTC
Permalink
Yes, when it wedges it uses upwards of 80-90% CPU according to top but is
otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP. I attach
gdb to it while it's spinning the CPU that's where I get the stack trace.
 Interestingly a simple SIGTERM will break the loop and ntpd will actually
close out normally -- it issues the exit messages in the logs about
releasing kernel discipline and then message "exit on signal 15" (or
whatever the number is for a kill command with no signal flag, I don't
recall).
Apologies if I've misunderstood something here, but if using the hand
rolled snprintf does not use dtoa() and avoids the issue would it make
sense to try one of:

a) Modify the hand rolled snprintf to use dtoa() to confirm it is the
dtoa() calls, plus have it keep a static fd open and just write out
the argument before calling each dtoa(). If it hangs, you have a
history of dtoa() calls which you can replay in a test app to see
which bit pattern or sequence of bit patterns causes the issue.

b) Like a) but add the logging to the system snprint() and compiling a
custom libc.so which is referenced by LD_LIBRARY_PATH

c) If the dtoa() log is unhelpful, similar but logging the stdarg
values to snprintf()...
Dave Hart
2012-03-09 19:19:49 UTC
Permalink
Post by David Brownlee
Yes, when it wedges it uses upwards of 80-90% CPU according to top but is
otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP. I attach
gdb to it while it's spinning the CPU that's where I get the stack trace.
 Interestingly a simple SIGTERM will break the loop and ntpd will actually
close out normally -- it issues the exit messages in the logs about
releasing kernel discipline and then message "exit on signal 15" (or
whatever the number is for a kill command with no signal flag, I don't
recall).
Apologies if I've misunderstood something here, but if using the hand
rolled snprintf does not use dtoa() and avoids the issue would it make
a) Modify the hand rolled snprintf to use dtoa() to confirm it is the
dtoa() calls, plus have it keep a static fd open and just write out
the argument before calling each dtoa(). If it hangs, you have a
history of dtoa() calls which you can replay in a test app to see
which bit pattern or sequence of bit patterns causes the issue.
Attached and inlined below is a patch to ntpd's replacement snprintf()
to log floating point values as hex dumps to "printf_dtoa.log" then
call dtoa(), though it doesn't actually use dtoa's text conversion
result. AGC, if you apply this patch and rebuild ntpd (configured
with --enable-C99-snprintf) the log could be very helpful assuming it
eventually spins infinitely inside dtoa().

The code compiles but I don't have dtoa() on this system to test with,
so I haven't tested it.

Cheers,
Dave Hart


===== libntp/snprintf.c 1.12 vs edited =====
--- 1.12/libntp/snprintf.c 2011-10-21 19:10:01 +00:00
+++ edited/libntp/snprintf.c 2012-03-09 19:08:49 +00:00
@@ -192,6 +192,9 @@
#include <config.h>
#endif /* HAVE_CONFIG_H */

+extern char *dtoa(double d0, int mode, int ndigits, int *decpt, int
*sign, char **rve);
+extern void freedtoa(char *s);
+
#if TEST_SNPRINTF
#include <math.h> /* For pow(3), NAN, and INFINITY. */
#include <string.h> /* For strcmp(3). */
@@ -300,6 +303,7 @@
#endif /* TEST_SNPRINTF */

#if HW_WANT_RPL_SNPRINTF || HW_WANT_RPL_VSNPRINTF ||
HW_WANT_RPL_ASPRINTF || HW_WANT_RPL_VASPRINTF
+#include <math.h> /* dtoa() */
#include <stdio.h> /* For NULL, size_t, vsnprintf(3), and vasprintf(3). */
#ifdef VA_START
#undef VA_START
@@ -1100,6 +1104,11 @@ fmtflt(char *str, size_t *len, size_t si
char iconvert[MAX_CONVERT_LENGTH];
char fconvert[MAX_CONVERT_LENGTH];
char econvert[4]; /* "e-12" (without nul-termination). */
+ static FILE *dtoa_log = NULL;
+ const u_char *puch;
+ int dtoamode;
+ int isneg;
+ int decpt;
char esign = 0;
char sign = 0;
int leadfraczeros = 0;
@@ -1124,6 +1133,24 @@ fmtflt(char *str, size_t *len, size_t si
*/
if (precision == -1)
precision = 6;
+
+ if (NULL == dtoa_log)
+ dtoa_log = fopen("printf_dtoa.log", "w");
+ if (NULL != dtoa_log) {
+ for (puch = (void *)&fvalue;
+ puch < (void *)(&fvalue + 1);
+ puch++)
+ fprintf(dtoa_log, "%02x", *puch);
+ fprintf(dtoa_log, "\n");
+ fflush(dtoa_log);
+ if (flags & PRINT_F_TYPE_E)
+ dtoamode = 2;
+ else if (flags & PRINT_F_TYPE_G)
+ dtoamode = 0;
+ else
+ dtoamode = 3;
+ freedtoa(dtoa(fvalue, dtoamode, precision, &decpt, &isneg, NULL));
+ }

if (fvalue < 0.0)
sign = '-';
AGC
2012-03-10 07:05:10 UTC
Permalink
Post by Dave Hart
Post by David Brownlee
Yes, when it wedges it uses upwards of 80-90% CPU according to top but is
otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP. I attach
gdb to it while it's spinning the CPU that's where I get the stack trace.
Interestingly a simple SIGTERM will break the loop and ntpd will actually
close out normally -- it issues the exit messages in the logs about
releasing kernel discipline and then message "exit on signal 15" (or
whatever the number is for a kill command with no signal flag, I don't
recall).
Apologies if I've misunderstood something here, but if using the hand
rolled snprintf does not use dtoa() and avoids the issue would it make
a) Modify the hand rolled snprintf to use dtoa() to confirm it is the
dtoa() calls, plus have it keep a static fd open and just write out
the argument before calling each dtoa(). If it hangs, you have a
history of dtoa() calls which you can replay in a test app to see
which bit pattern or sequence of bit patterns causes the issue.
Attached and inlined below is a patch to ntpd's replacement snprintf()
to log floating point values as hex dumps to "printf_dtoa.log" then
call dtoa(), though it doesn't actually use dtoa's text conversion
result. AGC, if you apply this patch and rebuild ntpd (configured
with --enable-C99-snprintf) the log could be very helpful assuming it
eventually spins infinitely inside dtoa().
The code compiles but I don't have dtoa() on this system to test with,
so I haven't tested it.
Cheers,
Dave Hart
The patch took but the compile fails:

# make
[ ! -r ./../COPYRIGHT ] || [
check-COPYRIGHT-submake -nt ./../COPYRIGHT ] || make
check-COPYRIGHT-submake
cd . && ./scripts/checkChangeLog
make all-recursive
Making all in scripts
Making all in include
Making all in isc
Making all in libntp
make all-am
CC snprintf.o
snprintf.c: In function 'fmtflt':
snprintf.c:1108: error: expected '=', ',', ';', 'asm' or '__attribute__'
before '*' token
snprintf.c:1108: error: 'puch' undeclared (first use in this function)
snprintf.c:1108: error: (Each undeclared identifier is reported only once
snprintf.c:1108: error: for each function it appears in.)
*** Error code 1

Stop.
make: stopped in /usr/src/ntp-dev-4.2.7p259/libntp
*** Error code 1

Stop.
make: stopped in /usr/src/ntp-dev-4.2.7p259/libntp
*** Error code 1

Stop.
make: stopped in /usr/src/ntp-dev-4.2.7p259
*** Error code 1

Stop.
make: stopped in /usr/src/ntp-dev-4.2.7p259
Dave Hart
2012-03-10 08:31:24 UTC
Permalink
Post by AGC
snprintf.c:1108: error: expected '=', ',', ';', 'asm' or '__attribute__'
before '*' token
snprintf.c:1108: error: 'puch' undeclared (first use in this function)
Change line 1108 to:

const unsigned char *puch;

Sorry for the flub,
Dave Hart
AGC
2012-03-10 19:48:56 UTC
Permalink
Post by Dave Hart
Post by AGC
snprintf.c:1108: error: expected '=', ',', ';', 'asm' or '__attribute__'
before '*' token
snprintf.c:1108: error: 'puch' undeclared (first use in this function)
const unsigned char *puch;
Made it a little farther this time but still failed:

../libntp/libntp.a(snprintf.o): In function `fmtflt':
/usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
to `dtoa'
/usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
to `freedtoa'
*** Error code 1
Christos Zoulas
2012-03-10 20:28:41 UTC
Permalink
On Mar 10, 11:48am, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/10/2012 00:31, Dave Hart wrote:
| > On Sat, Mar 10, 2012 at 07:05, AGC<agcarver+***@acarver.net> wrote:
| >> snprintf.c:1108: error: expected '=', ',', ';', 'asm' or '__attribute__'
| >> before '*' token
| >> snprintf.c:1108: error: 'puch' undeclared (first use in this function)
| >
| > Change line 1108 to:
| >
| > const unsigned char *puch;
|
| Made it a little farther this time but still failed:
|
| ../libntp/libntp.a(snprintf.o): In function `fmtflt':
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `dtoa'
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `freedtoa'
| *** Error code 1
-- End of excerpt from AGC

Try __dtoa and __freedtoa

christos
AGC
2012-03-10 21:54:33 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>> snprintf.c:1108: error: expected '=', ',', ';', 'asm' or '__attribute__'
|>> before '*' token
|>> snprintf.c:1108: error: 'puch' undeclared (first use in this function)
|>
|>
|> const unsigned char *puch;
|
|
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `dtoa'
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `freedtoa'
| *** Error code 1
-- End of excerpt from AGC
Try __dtoa and __freedtoa
That fixed it. printf_dtoa.log has been generated so I'll keep an eye
on it and see what happens.
AGC
2012-03-10 22:56:45 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>> snprintf.c:1108: error: expected '=', ',', ';', 'asm' or
'__attribute__'
|>> before '*' token
|>> snprintf.c:1108: error: 'puch' undeclared (first use in this
function)
|>
|>
|> const unsigned char *puch;
|
|
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `dtoa'
| /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| to `freedtoa'
| *** Error code 1
-- End of excerpt from AGC
Try __dtoa and __freedtoa
That fixed it. printf_dtoa.log has been generated so I'll keep an eye on
it and see what happens.
Well, that didn't take long:


(gdb) bt
#0 0x103c3430 in .umul () from /usr/lib/libc.so.12
#1 0x103b58fc in __pow5mult_D2A () from /usr/lib/libc.so.12
#2 0x103b5adc in __muldi3 () from /usr/lib/libc.so.12
#3 0x103b5494 in __mult_D2A () from /usr/lib/libc.so.12
#4 0x103b5728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#5 0x103a8138 in __dtoa () from /usr/lib/libc.so.12
#6 0x00067a04 in fmtflt (str=0xefffe450 " 109.76 97.75 98.77 97.79
103.00 97.72 98.15", len=0xefffe364,
size=184, fvalue=-4.2612301185727119, width=0, precision=2,
flags=0, overflow=0xefffe360) at snprintf.c:1152
#7 0x0006898c in rpl_vsnprintf (str=0xefffe450 " 109.76 97.75 98.77
97.79 103.00 97.72 98.15", size=184,
format=<value optimized out>, args=0xefffe438) at snprintf.c:851
#8 0x00068de4 in rpl_snprintf (str=0xefffe450 " 109.76 97.75 98.77
97.79 103.00 97.72 98.15", size=184,
format=0x8a260 " %.2f") at snprintf.c:1583
#9 0x000252a0 in ctl_putarray (tag=<value optimized out>, arr=0xb3128,
start=2) at ntp_control.c:1673
#10 0x000262e8 in ctl_putpeer (id=30, p=0xb2fa0) at ntp_control.c:2506
#11 0x0002b068 in read_variables (rbufp=0x10528000, restrict_mask=0) at
ntp_control.c:3012
#12 0x00028a44 in process_control (rbufp=0x10528000, restrict_mask=0) at
ntp_control.c:1123
#13 0x00038744 in receive (rbufp=0x10528000) at ntp_proto.c:432
#14 0x00023410 in ntpdmain (argc=0, argv=0xefffebdc) at ntpd.c:1069
#15 0x000138a4 in ___start ()
#16 0x000137dc in _start ()

The printf_dota.log file is about 2.2 MB located here:

http://acarver.net/ntpd/printf_dtoa.log
Christos Zoulas
2012-03-11 03:14:05 UTC
Permalink
On Mar 10, 2:56pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/10/2012 13:54, AGC wrote:
| > On 3/10/2012 12:28, Christos Zoulas wrote:
| >> On Mar 10, 11:48am, agcarver+***@acarver.net (AGC) wrote:
| >> -- Subject: Re: ntpd wedged by libc?
| >>
| >> | On 3/10/2012 00:31, Dave Hart wrote:
| >> |> On Sat, Mar 10, 2012 at 07:05, AGC<agcarver+***@acarver.net> wrote:
| >> |>> snprintf.c:1108: error: expected '=', ',', ';', 'asm' or
| >> '__attribute__'
| >> |>> before '*' token
| >> |>> snprintf.c:1108: error: 'puch' undeclared (first use in this
| >> function)
| >> |>
| >> |> Change line 1108 to:
| >> |>
| >> |> const unsigned char *puch;
| >> |
| >> | Made it a little farther this time but still failed:
| >> |
| >> | ../libntp/libntp.a(snprintf.o): In function `fmtflt':
| >> | /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| >> | to `dtoa'
| >> | /usr/src/ntp-dev-4.2.7p259/libntp/snprintf.c:1152: undefined reference
| >> | to `freedtoa'
| >> | *** Error code 1
| >> -- End of excerpt from AGC
| >>
| >> Try __dtoa and __freedtoa
| >
| > That fixed it. printf_dtoa.log has been generated so I'll keep an eye on
| > it and see what happens.
| >
|
| Well, that didn't take long:
|
|
| (gdb) bt
| #0 0x103c3430 in .umul () from /usr/lib/libc.so.12
| #1 0x103b58fc in __pow5mult_D2A () from /usr/lib/libc.so.12
| #2 0x103b5adc in __muldi3 () from /usr/lib/libc.so.12
| #3 0x103b5494 in __mult_D2A () from /usr/lib/libc.so.12
| #4 0x103b5728 in __pow5mult_D2A () from /usr/lib/libc.so.12
| #5 0x103a8138 in __dtoa () from /usr/lib/libc.so.12
| #6 0x00067a04 in fmtflt (str=0xefffe450 " 109.76 97.75 98.77 97.79
| 103.00 97.72 98.15", len=0xefffe364,
| size=184, fvalue=-4.2612301185727119, width=0, precision=2,
| flags=0, overflow=0xefffe360) at snprintf.c:1152
| #7 0x0006898c in rpl_vsnprintf (str=0xefffe450 " 109.76 97.75 98.77
| 97.79 103.00 97.72 98.15", size=184,
| format=<value optimized out>, args=0xefffe438) at snprintf.c:851
| #8 0x00068de4 in rpl_snprintf (str=0xefffe450 " 109.76 97.75 98.77
| 97.79 103.00 97.72 98.15", size=184,
| format=0x8a260 " %.2f") at snprintf.c:1583
| #9 0x000252a0 in ctl_putarray (tag=<value optimized out>, arr=0xb3128,
| start=2) at ntp_control.c:1673
| #10 0x000262e8 in ctl_putpeer (id=30, p=0xb2fa0) at ntp_control.c:2506
| #11 0x0002b068 in read_variables (rbufp=0x10528000, restrict_mask=0) at
| ntp_control.c:3012
| #12 0x00028a44 in process_control (rbufp=0x10528000, restrict_mask=0) at
| ntp_control.c:1123
| #13 0x00038744 in receive (rbufp=0x10528000) at ntp_proto.c:432
| #14 0x00023410 in ntpdmain (argc=0, argv=0xefffebdc) at ntpd.c:1069
| #15 0x000138a4 in ___start ()
| #16 0x000137dc in _start ()
|
| The printf_dota.log file is about 2.2 MB located here:
|
| http://acarver.net/ntpd/printf_dtoa.log

Hmm, does not seem to leak for me. Here's how I tested it:


#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/endian.h>

int
main(int argc, char *argv[])
{
union {
unsigned long long ull;
double d;
} u;
char line[128];

FILE *fp = fopen("/dev/null", "w");

while (fgets(line, sizeof(line), stdin)) {
char *p = strchr(line, '\n');
if (p)
*p = '\0';
errno = 0;
u.ull = be64toh(strtoull(line, &p, 16));
if (errno)
err(1, "line=%s", line);
else if (p == line || *p)
errx(1, "line=%s", line);
fprintf(fp, "%g", u.d);
}
fclose(fp);
return 0;
}
Dave Hart
2012-03-11 04:14:42 UTC
Permalink
Post by Christos Zoulas
|
|
| (gdb) bt
| #0  0x103c3430 in .umul () from /usr/lib/libc.so.12
| #1  0x103b58fc in __pow5mult_D2A () from /usr/lib/libc.so.12
| #2  0x103b5adc in __muldi3 () from /usr/lib/libc.so.12
| #3  0x103b5494 in __mult_D2A () from /usr/lib/libc.so.12
| #4  0x103b5728 in __pow5mult_D2A () from /usr/lib/libc.so.12
| #5  0x103a8138 in __dtoa () from /usr/lib/libc.so.12
| #6  0x00067a04 in fmtflt (str=0xefffe450 " 109.76 97.75 98.77 97.79
| 103.00 97.72 98.15", len=0xefffe364,
|      size=184, fvalue=-4.2612301185727119, width=0, precision=2,
| flags=0, overflow=0xefffe360) at snprintf.c:1152
| #7  0x0006898c in rpl_vsnprintf (str=0xefffe450 " 109.76 97.75 98.77
| 97.79 103.00 97.72 98.15", size=184,
|      format=<value optimized out>, args=0xefffe438) at snprintf.c:851
| #8  0x00068de4 in rpl_snprintf (str=0xefffe450 " 109.76 97.75 98.77
| 97.79 103.00 97.72 98.15", size=184,
|      format=0x8a260 " %.2f") at snprintf.c:1583
| #9  0x000252a0 in ctl_putarray (tag=<value optimized out>, arr=0xb3128,
| start=2) at ntp_control.c:1673
| #10 0x000262e8 in ctl_putpeer (id=30, p=0xb2fa0) at ntp_control.c:2506
| #11 0x0002b068 in read_variables (rbufp=0x10528000, restrict_mask=0) at
| ntp_control.c:3012
| #12 0x00028a44 in process_control (rbufp=0x10528000, restrict_mask=0) at
| ntp_control.c:1123
| #13 0x00038744 in receive (rbufp=0x10528000) at ntp_proto.c:432
| #14 0x00023410 in ntpdmain (argc=0, argv=0xefffebdc) at ntpd.c:1069
| #15 0x000138a4 in ___start ()
| #16 0x000137dc in _start ()
|
|
| http://acarver.net/ntpd/printf_dtoa.log
Thanks for trying. A question and a suggestion below.
Post by Christos Zoulas
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/endian.h>
int
main(int argc, char *argv[])
{
       union {
               unsigned long long ull;
               double d;
       } u;
       char line[128];
       FILE *fp = fopen("/dev/null", "w");
       while (fgets(line, sizeof(line), stdin)) {
               char *p = strchr(line, '\n');
               if (p)
                       *p = '\0';
               errno = 0;
               u.ull = be64toh(strtoull(line, &p, 16));
Assuming you're testing this on sparc32, is it necessary to byte-swap
with be64toh()? Did you verify, for example, that the last value in
the log interpreted as a double is -4.2612301185727119?
Post by Christos Zoulas
               if (errno)
                       err(1, "line=%s", line);
               else if (p == line || *p)
                       errx(1, "line=%s", line);
               fprintf(fp, "%g", u.d);
You might be able to tickle it by changing this to:

fprintf(fp, "%g %.2f %.3f", u.d, u.d, u.d);
Post by Christos Zoulas
       }
       fclose(fp);
       return 0;
}
Thanks,
Dave Hart
AGC
2012-03-11 05:37:14 UTC
Permalink
Post by Dave Hart
Post by Christos Zoulas
|
Thanks for trying. A question and a suggestion below.
Assuming you're testing this on sparc32, is it necessary to byte-swap
with be64toh()? Did you verify, for example, that the last value in
the log interpreted as a double is -4.2612301185727119?
Post by Christos Zoulas
if (errno)
err(1, "line=%s", line);
else if (p == line || *p)
errx(1, "line=%s", line);
fprintf(fp, "%g", u.d);
fprintf(fp, "%g %.2f %.3f", u.d, u.d, u.d);
Post by Christos Zoulas
}
fclose(fp);
return 0;
}
Thanks,
Dave Hart
I compiled this test both ways and so far it is not leaking at all.
Christos Zoulas
2012-03-11 16:39:01 UTC
Permalink
On Mar 10, 9:37pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/10/2012 20:14, Dave Hart wrote:
| > On Sun, Mar 11, 2012 at 03:14, Christos Zoulas<***@zoulas.com> wrote:
| >> On Mar 10, 2:56pm, agcarver+***@acarver.net (AGC) wrote:
| >> | Well, that didn't take long:
| >> |
| >>
| >> Hmm, does not seem to leak for me. Here's how I tested it:
| >
| > Thanks for trying. A question and a suggestion below.
| >
| >
| > Assuming you're testing this on sparc32, is it necessary to byte-swap
| > with be64toh()? Did you verify, for example, that the last value in
| > the log interpreted as a double is -4.2612301185727119?

It is portable to do it this way. I am testing on amd64. I have sparc64 but
no sparc32.

| > You might be able to tickle it by changing this to:
| >
| > fprintf(fp, "%g %.2f %.3f", u.d, u.d, u.d);

Will try, thanks!

christos
AGC
2012-03-18 19:47:21 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|>> |
|>>
|>
|> Thanks for trying. A question and a suggestion below.
|>
|>
|> Assuming you're testing this on sparc32, is it necessary to byte-swap
|> with be64toh()? Did you verify, for example, that the last value in
|> the log interpreted as a double is -4.2612301185727119?
It is portable to do it this way. I am testing on amd64. I have sparc64 but
no sparc32.
|>
|> fprintf(fp, "%g %.2f %.3f", u.d, u.d, u.d);
Will try, thanks!
Ok, it seems that the memory leak isn't occuring anymore but there may
be an infinite loop problem. Using the snprintf workaround in ntpd, it
still wedged a couple days ago using over 80% CPU. The memory usage
jumped up some (from 5M to 10M) but it did not go beyond that even
though I let it run in the wedged condition for hours.

The back trace is below. The daemon was running for about 12 days
straight with no problems and then suddenly wedged.

Could this be a threads issue? Reading the header in
libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
Is there a chance for cross thread clobbering or is it just a case of a
termination condition not being met in a loop and letting things spin
out of control?

#0 0x103c33fc in .umul () from /usr/lib/libc.so.12
#1 0x103b58ec in __pow5mult_D2A () from /usr/lib/libc.so.12
#2 0x103b5adc in __muldi3 () from /usr/lib/libc.so.12
#3 0x103b5494 in __mult_D2A () from /usr/lib/libc.so.12
#4 0x103b56e4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#5 0x103a8138 in __dtoa () from /usr/lib/libc.so.12
#6 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#7 0x103a64f8 in vfprintf () from /usr/lib/libc.so.12
#8 0x103a184c in fprintf () from /usr/lib/libc.so.12
#9 0x00042518 in record_peer_stats (addr=0xb3890, status=37917,
offset=0.001765422523021698, delay=0.077142598573118448,
dispersion=0.0025113349042817576, jitter=0.0001220703125) at
ntp_util.c:536
#10 0x00036f38 in clock_filter (peer=0xb3880,
sample_offset=0.001765422523021698, sample_delay=<value optimized out>,
sample_disp=0.00012418112579500303) at ntp_proto.c:2350
#11 0x0003772c in process_packet (peer=0xb3880, pkt=0x10522058, len=48)
at ntp_proto.c:1776
#12 0x00039110 in receive (rbufp=0x10522000) at ntp_proto.c:1459
#13 0x000233e8 in ntpdmain (argc=0, argv=0xefffedf4) at ntpd.c:1069
#14 0x0001387c in ___start ()
#15 0x000137b4 in _start ()
Christos Zoulas
2012-03-18 21:10:10 UTC
Permalink
On Mar 18, 12:47pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem. Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU. The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
|
| The back trace is below. The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
|
| Could this be a threads issue? Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?

Could we be looking at a compiler bug here?

christos
Dave Hart
2012-03-18 22:00:34 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem.  Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU.  The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
|
| The back trace is below.  The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
|
| Could this be a threads issue?  Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?
Could we be looking at a compiler bug here?
One test going down that road would be to recompile libc (or select
parts) with compiler optimization disabled. I'm not familiar with the
NetBSD build setup. Would someone in the know help AGC by describing
how to recompile without optimization on netbsd 5.x sparc32?

Thanks,
Dave Hart
AGC
2012-04-02 19:29:25 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem. Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU. The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
|
| The back trace is below. The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
|
| Could this be a threads issue? Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?
Could we be looking at a compiler bug here?
christos
I tried something when ntpd got hung today. I used the debugger to
forcibly end the stuck call:

(gdb) bt
#0 0x103b5480 in __mult_D2A () from /usr/lib/libc.so.12
#1 0x103b56e4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#2 0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12
#3 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#4 0x103a64f8 in vfprintf () from /usr/lib/libc.so.12
#5 0x103a184c in fprintf () from /usr/lib/libc.so.12
#6 0x000424bc in record_loop_stats (offset=6.920439999997807e-05,
freq=-7.7587471658325353e-05,
jitter=0.00012207031250000005, wander=1.2000280120993315e-09,
spoll=4) at ntp_util.c:584
#7 0x00031760 in local_clock (peer=0xb29a0,
fp_offset=6.920439999997807e-05) at ntp_loopfilter.c:666
#8 0x00036524 in clock_select () at ntp_proto.c:1851
#9 0x00037024 in clock_filter (peer=0xb29a0,
sample_offset=6.920439999997807e-05,
sample_delay=<value optimized out>, sample_disp=0.0001220703125) at
ntp_proto.c:2360
#10 0x0003babc in refclock_receive (peer=0xb29a0) at ntp_refclock.c:556
#11 0x0003be50 in refclock_transmit (peer=0xb29a0) at ntp_refclock.c:335
#12 0x00041724 in timer () at ntp_timer.c:320
#13 0x000233cc in ntpdmain (argc=0, argv=0xefffec10) at ntpd.c:1026
#14 0x0001382c in ___start ()
#15 0x00013764 in _start ()

(gdb) return
Make selected stack frame return now? (y or n) y
#0 0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12


This seems to have gotten it unstuck because ntpd started running
normally again as soon as I exited gdb.

It was in the middle of writing to a file when the code bombed. The
area around the log file was (not that I think it's terribly useful):

56019 40827.814 0.000070325 -77.589 0.000122070 0.001219 4
56019 40843.823 0.\x00\x00\x00 -77.587 0.000122070 0.001200 4
56019 50415.027 0.000000000 -77.587 0.000122070 0.001123 6


So it appears there's an infinite loop occurring in __mult_D2A (or
possibly above it in _pow5mult_D2A)
Christos Zoulas
2012-04-02 19:59:40 UTC
Permalink
On Apr 2, 12:29pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| I tried something when ntpd got hung today. I used the debugger to
| forcibly end the stuck call:

Ok, hung, but not spinning? Is it taking a lot of CPU or not?

| This seems to have gotten it unstuck because ntpd started running
| normally again as soon as I exited gdb.
|
| It was in the middle of writing to a file when the code bombed. The
| area around the log file was (not that I think it's terribly useful):
|
| 56019 40827.814 0.000070325 -77.589 0.000122070 0.001219 4
| 56019 40843.823 0.\x00\x00\x00 -77.587 0.000122070 0.001200 4
| 56019 50415.027 0.000000000 -77.587 0.000122070 0.001123 6
|
|
| So it appears there's an infinite loop occurring in __mult_D2A (or
| possibly above it in _pow5mult_D2A)

One of the things you can try to do is to comment out the ACQUIRE/FREE
lock calls with a (1) argument. This can waste a bit more memory, but
will avoid deadlocks or long waits in the pow5mult computation.

christos
AGC
2012-04-02 20:01:58 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
| I tried something when ntpd got hung today. I used the debugger to
Ok, hung, but not spinning? Is it taking a lot of CPU or not?
Yes, it is spinning at 80-90% CPU.
Post by Christos Zoulas
| This seems to have gotten it unstuck because ntpd started running
| normally again as soon as I exited gdb.
|
| It was in the middle of writing to a file when the code bombed. The
|
| 56019 40827.814 0.000070325 -77.589 0.000122070 0.001219 4
| 56019 40843.823 0.\x00\x00\x00 -77.587 0.000122070 0.001200 4
| 56019 50415.027 0.000000000 -77.587 0.000122070 0.001123 6
|
|
| So it appears there's an infinite loop occurring in __mult_D2A (or
| possibly above it in _pow5mult_D2A)
One of the things you can try to do is to comment out the ACQUIRE/FREE
lock calls with a (1) argument. This can waste a bit more memory, but
will avoid deadlocks or long waits in the pow5mult computation.
I can give that a try later this week.
Christos Zoulas
2012-04-02 20:48:48 UTC
Permalink
On Apr 2, 1:01pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 4/2/2012 12:59, Christos Zoulas wrote:
| > On Apr 2, 12:29pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | I tried something when ntpd got hung today. I used the debugger to
| > | forcibly end the stuck call:
| >
| > Ok, hung, but not spinning? Is it taking a lot of CPU or not?
|
| Yes, it is spinning at 80-90% CPU.

Hmm, too bad.

| > | This seems to have gotten it unstuck because ntpd started running
| > | normally again as soon as I exited gdb.
| > |
| > | It was in the middle of writing to a file when the code bombed. The
| > | area around the log file was (not that I think it's terribly useful):
| > |
| > | 56019 40827.814 0.000070325 -77.589 0.000122070 0.001219 4
| > | 56019 40843.823 0.\x00\x00\x00 -77.587 0.000122070 0.001200 4
| > | 56019 50415.027 0.000000000 -77.587 0.000122070 0.001123 6
| > |
| > |
| > | So it appears there's an infinite loop occurring in __mult_D2A (or
| > | possibly above it in _pow5mult_D2A)
| >
| > One of the things you can try to do is to comment out the ACQUIRE/FREE
| > lock calls with a (1) argument. This can waste a bit more memory, but
| > will avoid deadlocks or long waits in the pow5mult computation.
|
| I can give that a try later this week.

I don't think that it will work then, because spinning means not stuck at
a lock. I would try it anyway.

christos

Christos Zoulas
2012-03-08 02:58:00 UTC
Permalink
On Mar 7, 5:14pm, ***@ntp.org (Dave Hart) wrote:
-- Subject: Re: ntpd wedged by libc?

| On Wed, Mar 7, 2012 at 12:08, Christos Zoulas <***@zoulas.com> wrote:
| > On Mar 7, 12:08am, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | I ran the test with both %.6f and stepped it up even further all the wa=
| y
| > | to %.10f with no ill effects. =A0The test passed all the extra precisio=
| n
| > | formats:
| >
| > Comment out the sprintf in ntpd or replace it with something else and see
| > if it survives...
|
| I'm pretty sure it will. Before the libc repair, AGC found he could
| work around the problem by configuring ntpd 4.2.7 with
| --enable-C99-snprintf, which forces use of a handrolled
| printf/snprintf/vsnprintf replacement package otherwise used only if
| the default snprintf is not C99-compliant. That implementation does
| not use dtoa() and he found ntpd survived indefinitely.
|
| The key difference between ntpd's use of snprintf and t_printf.c may
| be simply the number of calls over time from the same process. AGC is
| polling ntpd every 5 seconds with ntpq -p, which triggers about 7
| floating point snprintf()s per peer (line of the ntpq -p peers
| billboard table). I believe all of those are using "%.3f".
|
| You might be able to expose the bug with t_printf.c by changing it to
| generate and snprintf thousands of random floating point numbers.
|
That is what t_printf does... One million snprintf random numbers. Another
thing to do is to run top while it is running to see if it grows...

christos
AGC
2012-03-08 05:41:05 UTC
Permalink
Post by Christos Zoulas
| You might be able to expose the bug with t_printf.c by changing it to
| generate and snprintf thousands of random floating point numbers.
|
That is what t_printf does... One million snprintf random numbers. Another
thing to do is to run top while it is running to see if it grows...
No change in top while it's running. It stays at 3228K/804K(resident)
the entire time.
Christos Zoulas
2012-03-08 17:40:51 UTC
Permalink
On Mar 7, 9:41pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/7/2012 18:58, Christos Zoulas wrote:
| > On Mar 7, 5:14pm, ***@ntp.org (Dave Hart) wrote:
|
| > | You might be able to expose the bug with t_printf.c by changing it to
| > | generate and snprintf thousands of random floating point numbers.
| > |
| > That is what t_printf does... One million snprintf random numbers. Another
| > thing to do is to run top while it is running to see if it grows...
|
| No change in top while it's running. It stays at 3228K/804K(resident)
| the entire time.

So something else must be leaking. You said that with the c99 snprintf
it does not leak anymore? So that we've settled that this is snprintf/dtoa
related?

christos
AGC
2012-03-09 03:06:38 UTC
Permalink
Post by Christos Zoulas
-- Subject: Re: ntpd wedged by libc?
|
|> | You might be able to expose the bug with t_printf.c by changing it to
|> | generate and snprintf thousands of random floating point numbers.
|> |
|> That is what t_printf does... One million snprintf random numbers. Another
|> thing to do is to run top while it is running to see if it grows...
|
| No change in top while it's running. It stays at 3228K/804K(resident)
| the entire time.
So something else must be leaking. You said that with the c99 snprintf
it does not leak anymore? So that we've settled that this is snprintf/dtoa
related?
Correct, using the hand-coded snprintf replacement inside ntpd's code
(activated by a configure switch) ntpd does not leak memory that I can
tell. Other things that use libc appear to leak somewhat but I'm unsure
what as the rate of leakage is slow on those processes.
Christos Zoulas
2012-03-09 13:58:10 UTC
Permalink
On Mar 8, 7:06pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| Correct, using the hand-coded snprintf replacement inside ntpd's code
| (activated by a configure switch) ntpd does not leak memory that I can
| tell. Other things that use libc appear to leak somewhat but I'm unsure
| what as the rate of leakage is slow on those processes.


So that leaves other parts of stdio that are different. Can you try
updating to the head version of vfwprintf()? ANother thing to do is
to compile ntpd with a debugging malloc to see where the leak is.

christos
Christos Zoulas
2012-03-04 21:00:43 UTC
Permalink
On Mar 2, 8:38pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| On 3/2/2012 05:30, Christos Zoulas wrote:
| > On Mar 1, 9:59pm, agcarver+***@acarver.net (AGC) wrote:
| > -- Subject: Re: ntpd wedged by libc?
| >
| > | The copy I have has a banner reading:
| > |
| > | /* $NetBSD: misc.c,v 1.4 2008/03/21 23:13:48 christos Exp $ */
| > |
| > | The copy I obtained from
| > | ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/lib/libc/gdtoa/misc.c
| > | (unless this isn't head)
| > |
| > | Has a header:
| > | /* $NetBSD: misc.c,v 1.11 2011/11/21 09:46:19 mlelstv Exp $ */
| >
| > Yup, that's the latest.
| > Get /usr/src/tests/lib/libc/stdio/t_printf.c and see if the snprintf_float
| > test leaks for you.
|
| # gcc -o t_printf t_printf.c
| t_printf.c: In function 'atfu_snprintf_float_body':
| t_printf.c:129: error: 'for' loop initial declaration used outside C99 mode

gcc -std=c99 -o t_printf t_printf.c

christos
Brett Lymn
2012-02-14 05:10:55 UTC
Permalink
Post by Christos Zoulas
# make
building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
*** Error code 127
Stop.
make: stopped in /mnt/src/lib/libc
either:

a) cd /mnt/src && ./build.sh tools
and then run make

or

b) make USETOOLS=no
--
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
Christos Zoulas
2012-02-14 13:50:27 UTC
Permalink
On Feb 13, 8:54pm, agcarver+***@acarver.net (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| I downloaded the whole tree by CVS but still nothing:
|
| # make
| building rem.S from /mnt/src/lib/libc/arch/sparc/gen/divrem.m4
| sh: /mnt/src/tooldir.NetBSD-5.1-sparc/bin/nbm4: not found
| *** Error code 127
|

make USETOOLS=never

christos
Loading...