Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
Embedded applications today are mostly 8 bit, but many, many designers have already begun the transition to 16 bit, and soon will be moving to 32 bit. The reasons are much the same as the reasons that general computing has moved from 8 to 16 to 32 and now to 64, with the cutting edge already looking at 128 bit and parallel processing, along with dedicated processors running 32 or 64 bit floating point math. Also the length of the integer used in C, which is a virtual machine is independent of the word length of the processor, except the C language designers (originally Kernigan and Ritchie) made the language somewhat flexible to simplify migration. That is why there were some undefined situations in the original specification. Remember that C is a virtual machine language, whose processor only has 24 instructions (I think the Ansi committee added a couple, but they have specific uses that were not foreseen in the original usage of the language) It can be ported to any machine currently extant by only writing about 1K of machine code, and even that can be done in another available higher level language if you so desire, as long as it is compiled for efficiency.
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
The only CVM I can find is Java's JVM. They have modified gcc (a C compiler) to produce byte code for that JVM.
Every compiler I've used compiles C to native machine code for the target platform. There is no intermediate language, and that's what gave C its famous speed.
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bitinteger.
No it's not. They used 16 bit integers because you can't do much of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
Chris
On Sat, 2007-05-26 at 08:03 -0500, Chris Schumann wrote:
Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
Embedded applications today are mostly 8 bit, but many, many designers have already begun the transition to 16 bit, and soon will be moving to 32 bit. The reasons are much the same as the reasons that general computing has moved from 8 to 16 to 32 and now to 64, with the cutting edge already looking at 128 bit and parallel processing, along with dedicated processors running 32 or 64 bit floating point math. Also the length of the integer used in C, which is a virtual machine is independent of the word length of the processor, except the C language designers (originally Kernigan and Ritchie) made the language somewhat flexible to simplify migration. That is why there were some undefined situations in the original specification. Remember that C is a virtual machine language, whose processor only has 24 instructions (I think the Ansi committee added a couple, but they have specific uses that were not foreseen in the original usage of the language) It can be ported to any machine currently extant by only writing about 1K of machine code, and even that can be done in another available higher level language if you so desire, as long as it is compiled for efficiency.
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
The only CVM I can find is Java's JVM. They have modified gcc (a C compiler) to produce byte code for that JVM.
Every compiler I've used compiles C to native machine code for the target platform. There is no intermediate language, and that's what gave C its famous speed.
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bitinteger.
No it's not. They used 16 bit integers because you can't do much of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
Chris
Compiled C is only about 1.8x less efficient than assembly code, so it is difficult to obtain better through put than C by assembly. However for drivers and frequently used libraries, most implementors do perform some hand assembly optimization, both to optimize the use of the processor, and to directly address some things that C doesn't do quite as well. As to the integer issue, you are correct, the numerical representation limitation was why 16 bits was chosen, and Microsoft stayed with it far longer than other implementors. As to the VM, while it is not present in the actual code, I did port C in the early 80's across a couple of processors for my own use. It was not at all difficult, and the standard IO library as described by K & R is written in C, for portability. At one time on CP/M, you could get a C compiler, and all the utilities to bring it up on a new system on a single 5" floppy and there was even a version released on audio cassette tape compatible with the Tarbell interface (300 baud to a autio cassette if I remember right.)
It has been a long time and I may not remember this clearly. So, I looked and can no longer find reference to the base machine design, nor do I have my original K&R book in my library. It may be in storage with my Altair. So I will bow to your statement, because I can no longer find the original documents which I "remember". I put it in quotes because apparently I am incorrect in my memory.
I dimly have the memory of how I ported the language a long time ago, but apparently memory is faulty here.
C is a wonderful language, and I have been using it since I first purchased a Processor Technology version, and the K & R book many decades ago. You have to tip your hat to K&R for a marvelous creation that has performed so well, that nearly 40 years later it is still one of the most efficient and powerful languages available. One of the highest tributes to their genius is the use of C to design and build other languages, often with the authors describing the limitations of C to their task.
To your comment about the libraries, I have to point out that there are only a few libraries that were part of the original C library package. A lot of the others were added by different implementers, many of whom did not use C to develop their libraries, so their code is indeed often difficult to read, debug, implement or port to new systems.
I seem to remember that the first fully implemented VM I ever saw was PASCAL by Kernigan and Plauger, which actually implemented a P-code and a true VM.
On another note: The first commercial personal computer I ever saw had wooden sides, a metal wrap around frame that held the motherboard and keyboard, used a Motorola processor (6502 I think), and an RF interface. I helped the officer who bought it by fixing some of his poor soldering and replacing one bad IC. A nand gate used for selecting RAM banks. Care to guess what that computer was?
At that time I didn't own a computer, but after having seen his, I went into Tokyo and bought some parts on the part of the Ginza that was later to become known as Akihabara, although I don't remember it being called that at the time.
I just realized that I have been at this stuff for more than 30 years. I feel old now. THANKS!! ;-)
regards, Les H
On Sat, 2007-05-26 at 08:03 -0500, Chris Schumann wrote:
Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
Embedded applications today are mostly 8 bit, but many, many designers have already begun the transition to 16 bit, and soon will be moving to 32 bit. The reasons are much the same as the reasons that general computing has moved from 8 to 16 to 32 and now to 64, with the cutting edge already looking at 128 bit and parallel processing, along with dedicated processors running 32 or 64 bit floating point math. Also the length of the integer used in C, which is a virtual machine is independent of the word length of the processor, except the C language designers (originally Kernigan and Ritchie) made the language somewhat flexible to simplify migration. That is why there were some undefined situations in the original specification. Remember that C is a virtual machine language, whose processor only has 24 instructions (I think the Ansi committee added a couple, but they have specific uses that were not foreseen in the original usage of the language) It can be ported to any machine currently extant by only writing about 1K of machine code, and even that can be done in another available higher level language if you so desire, as long as it is compiled for efficiency.
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
Having been on the ANSI C committee back in the day, Chris is correct. C was NEVER a "virtual machine language" system. That was left to Nicklas Wirth's team with the old UCSD P-System Pascal language and its derivatives (such as Modula 2).
Most compilers had multiple passes, a preprocessor (which processed the "#" statements", a tokenizer, and a machine-specific code generator. Whitesmiths' C made this REALLY clear as the preprocessor was called "cpp", the tokenizer "cp1" and the code generator "cp211" for the PDP-11, "cp2vax" for the VAX, "cp286" for the Intel x86 (186 and 286 in those days) and so on.
In fact the fact that there was a preprocessor is what gave you the first implementation of C++. Bjorn Sostroup (I'm sure I butchered the spelling of his name) replaced the preprocessor with a new version called "cfront" and you got C++ (or "incremental C").
The only CVM I can find is Java's JVM. They have modified gcc (a C compiler) to produce byte code for that JVM.
Every compiler I've used compiles C to native machine code for the target platform. There is no intermediate language, and that's what gave C its famous speed.
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bitinteger.
No it's not. They used 16 bit integers because you can't do much of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
Actually, that was a huge bone of contention in the committee meetings. The 16-bit integer was the "native" size of the registers on the machine that C was developed on (the PDP-11), so it sorta stuck. However, the standard makes absolutely no guarantees on how big an "int" is. It is completely up to the implementer of the compiler as to how big an "int" is (or a "char" is or an "unsigned long long" is...that's why the "sizeof" operator exists).
There is also no guarantee as to what a "null" is, other than no legitimate object will ever have a "null" address. I know of one system where the null address is all ones. The compiler knows that '\0' should be converted to the all ones format as well as when the word "null" should be, too.
P.J. Plaugher (founder of Whitesmiths') and the committee secretary had a great way of putting it: "An int is not a char and it's not a long, but will be the same as one or the other at various times in your career."
As far as the libraries are concerned, the initial draft of what was in the standard C library made the library so damned big that it wouldn't fit in the standard process memory footprint of a VAX at the time. That's when it got split up into the standard library, the "math" library, the "double precision" library and several others.
Once the concept of splitting up libraries came up, lots of splits were proposed: string handling was going to be in a separate library, network stuff, file management, you name it. Some people actually did implement separate libraries, as the famous Sun network library split shows.
---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens@internap.com - - VitalStream, Inc. http://www.vitalstream.com - - - - grasshopotomaus: A creature that can leap to tremendous heights... - - ...once. - ----------------------------------------------------------------------
On Tue, 2007-05-29 at 11:03 -0700, Rick Stevens wrote: I have to admit to confusing PASCAL and C. I used both in the early years, along with Basic. I was in the Navy and couldn't afford software. I wrote my own from the language descriptions in the books. This included editors, compilers, preprocessors (I used the same preprocessor with all my compilers), and debuggers. I still have a collection of a small box of audio tapes in storage with my altair that contains much of that code. It was easier to make it portable than to write another, especially since it was in assembly code for Z80 at the time. I also wrote my own virtual machine for lisp because the Z80 I had kept overwriting the stack, which messed up the stack oriented lisp code. Because I wrote so much of this stuff, and ported it to multiple machines, I simply confused it all.
I should have checked the manuals before I responded. I remember reading the arguments about size, but I no longer remember the details you mentioned, Rick. I have to plead stupidity for not checking the manuals and old age for bad memory. But a lot of this occurred more than 30 years ago, and a lot of code has passed through the old gray matter since, so please forgive my frailty, and excuse my stupidity, and Thank You for straightening me out.
Regards, Les Howell
On Tue, 2007-05-29 at 12:00 -0700, Les wrote:
On Tue, 2007-05-29 at 11:03 -0700, Rick Stevens wrote: I have to admit to confusing PASCAL and C. I used both in the early years, along with Basic. I was in the Navy and couldn't afford software. I wrote my own from the language descriptions in the books. This included editors, compilers, preprocessors (I used the same preprocessor with all my compilers), and debuggers. I still have a collection of a small box of audio tapes in storage with my altair that contains much of that code. It was easier to make it portable than to write another, especially since it was in assembly code for Z80 at the time. I also wrote my own virtual machine for lisp because the Z80 I had kept overwriting the stack, which messed up the stack oriented lisp code. Because I wrote so much of this stuff, and ported it to multiple machines, I simply confused it all.
I should have checked the manuals before I responded. I rememberreading the arguments about size, but I no longer remember the details you mentioned, Rick. I have to plead stupidity for not checking the manuals and old age for bad memory. But a lot of this occurred more than 30 years ago, and a lot of code has passed through the old gray matter since, so please forgive my frailty, and excuse my stupidity, and Thank You for straightening me out.
Hey, no problem, Les. I'm probably in there pitching with you regarding age! We're allowed "senior moments" now and then! Heheheheheh! ;-p
---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens@internap.com - - VitalStream, Inc. http://www.vitalstream.com - - - - Try to look unimportant. The bad guys may be low on ammo. - ----------------------------------------------------------------------
On Tue, 29 May 2007 13:28:35 -0700 Rick Stevens rstevens@internap.com wrote:
Hey, no problem, Les. I'm probably in there pitching with you regarding age! We're allowed "senior moments" now and then! Heheheheheh! ;-p
Hey, speak for yourself, Mr... er... say, what was your name again?
Don't worry, though. The last time someone asked me what my name is I checked my wallet and told him Genuine Leather.
Rick Stevens wrote:
On Sat, 2007-05-26 at 08:03 -0500, Chris Schumann wrote:
Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
[snip]
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
Having been on the ANSI C committee back in the day, Chris is correct. C was NEVER a "virtual machine language" system. That was left to
But the Standard uses the term "abstract machine", which may be what he intended.
Nicklas Wirth's team with the old UCSD P-System Pascal language and its derivatives (such as Modula 2).
Those interested in historical matters might look here:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
[snip]
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bit integer.
No it's not. They used 16 bit integers because you can't do much of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
Actually, that was a huge bone of contention in the committee meetings. The 16-bit integer was the "native" size of the registers on the machine that C was developed on (the PDP-11), so it sorta stuck. However, the standard makes absolutely no guarantees on how big an "int" is. It is completely up to the implementer of the compiler as to how big an "int" is (or a "char" is or an "unsigned long long" is...that's why the "sizeof" operator exists).
This is incorrect. An integer must be able to represent at least numbers in the range -32767 to 32767 inclusive.
[QUOTE MODE ON]
5.2.4.2.1 Sizes of integer types <limits.h> [#1] The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. ....
-- minimum value for an object of type int INT_MIN -32767
-- maximum value for an object of type int INT_MAX +32767
[QUOTE MODE OFF]
Since the Standard also requires a pure binary representation, this means that an int is at least 16 bits.
There is also no guarantee as to what a "null" is, other than no legitimate object will ever have a "null" address. I know of one system where the null address is all ones. The compiler knows that '\0' should be converted to the all ones format as well as when the word "null" should be, too.
I have used a system which had three values for the null address 0x0000 (in a certain mode only), 0x00000000, and 0x80000000. However, it must compare equal to an unsigned integer of the correct size when the integer has all bits off.
P.J. Plaugher (founder of Whitesmiths') and the committee secretary had a great way of putting it: "An int is not a char and it's not a long, but will be the same as one or the other at various times in your career."
As far as the libraries are concerned, the initial draft of what was in the standard C library made the library so damned big that it wouldn't fit in the standard process memory footprint of a VAX at the time.
Care to substantiate that? As far as the Standard is concerned, it does not address packaging issues.
That's when it got split up into the standard library, the "math" library, the "double precision" library and several others.
This has nothing to do with Standard C. It is a packaging matter, which the Standard does not address. For example, consider this language from the Standard:
[QUOTE MODE ON]
5.2.4.2.2 Characteristics of floating types <float.h>
....
[#4] The accuracy of the floating-point operations (+, -, *, /) and of the library functions in <math.h> and <complex.h> that return floating-point results is implementation defined. The implementation may state that the accuracy is unknown.
[QUOTE MODE OFF]
IOW, it addresses where the prototypes exist, but not where the object code exists or how it gets into your program (if, indeed, it ever does, most of my programs wind up using a shared library).
Once the concept of splitting up libraries came up, lots of splits were proposed: string handling was going to be in a separate library, network stuff, file management, you name it. Some people actually did implement separate libraries, as the famous Sun network library split shows.
Mike
Mike McCarty wrote:
Those interested in historical matters might look here: http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
OK but...
Actually, that was a huge bone of contention in the committee meetings. The 16-bit integer was the "native" size of the registers on the machine that C was developed on (the PDP-11), so it sorta stuck. However, the standard makes absolutely no guarantees on how big an "int" is. It is completely up to the implementer of the compiler as to how big an "int" is (or a "char" is or an "unsigned long long" is...that's why the "sizeof" operator exists).
This is incorrect. An integer must be able to represent at least numbers in the range -32767 to 32767 inclusive.
[QUOTE MODE ON]
5.2.4.2.1 Sizes of integer types <limits.h>
A quote from a 1989 standard? What were integers between 1973 and 1989? The language was well established before any committee meetings.
Les Mikesell wrote:
Mike McCarty wrote:
Those interested in historical matters might look here: http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
OK but...
Actually, that was a huge bone of contention in the committee meetings. The 16-bit integer was the "native" size of the registers on the machine that C was developed on (the PDP-11), so it sorta stuck. However, the standard makes absolutely no guarantees on how big an "int" is. It is completely up to the implementer of the compiler as to how big an "int" is (or a "char" is or an "unsigned long long" is...that's why the "sizeof" operator exists).
This is incorrect. An integer must be able to represent at least numbers in the range -32767 to 32767 inclusive.
[QUOTE MODE ON]
5.2.4.2.1 Sizes of integer types <limits.h>
A quote from a 1989 standard? What were integers between 1973 and 1989? The language was well established before any committee meetings.
A quote from a 1999 standard, actually. I was responding to the words above "However, THE STANDARD [emphasis added] makes absolutely no guarantees on how big an 'int' is. It is completely up to the implementer of the compiler as to how big an 'int' is (or a 'char' is or an 'unsigned long long' is...that's why the 'sizeof' operator exists)."
Note the present tense used. This claim is incorrect. The Standard makes very definite statements about the sizes of integral types, in terms of what they are guaranteed to be able to represent.
Mike
On Wed, 2007-05-30 at 14:31 -0500, Mike McCarty wrote:
Rick Stevens wrote:
On Sat, 2007-05-26 at 08:03 -0500, Chris Schumann wrote:
Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
[snip]
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
Having been on the ANSI C committee back in the day, Chris is correct. C was NEVER a "virtual machine language" system. That was left to
But the Standard uses the term "abstract machine", which may be what he intended.
Nicklas Wirth's team with the old UCSD P-System Pascal language and its derivatives (such as Modula 2).
Those interested in historical matters might look here:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
[snip]
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bit integer.
No it's not. They used 16 bit integers because you can't do much of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
Actually, that was a huge bone of contention in the committee meetings. The 16-bit integer was the "native" size of the registers on the machine that C was developed on (the PDP-11), so it sorta stuck. However, the standard makes absolutely no guarantees on how big an "int" is. It is completely up to the implementer of the compiler as to how big an "int" is (or a "char" is or an "unsigned long long" is...that's why the "sizeof" operator exists).
This is incorrect. An integer must be able to represent at least numbers in the range -32767 to 32767 inclusive.
[QUOTE MODE ON]
5.2.4.2.1 Sizes of integer types <limits.h> [#1] The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. ....
-- minimum value for an object of type int INT_MIN -32767 -- maximum value for an object of type int INT_MAX +32767[QUOTE MODE OFF]
Since the Standard also requires a pure binary representation, this means that an int is at least 16 bits.
I haven't looked at the standard in probably 15 years. I got off the committee before the standard was approved (got tired of paying to go to the bloody meetings when the company I worked for balked at it).
There is also no guarantee as to what a "null" is, other than no legitimate object will ever have a "null" address. I know of one system where the null address is all ones. The compiler knows that '\0' should be converted to the all ones format as well as when the word "null" should be, too.
I have used a system which had three values for the null address 0x0000 (in a certain mode only), 0x00000000, and 0x80000000. However, it must compare equal to an unsigned integer of the correct size when the integer has all bits off.
Correct. The compiler knows what "null" means in various contexts, but it by no way means "all zeroes".
P.J. Plaugher (founder of Whitesmiths') and the committee secretary had a great way of putting it: "An int is not a char and it's not a long, but will be the same as one or the other at various times in your career."
As far as the libraries are concerned, the initial draft of what was in the standard C library made the library so damned big that it wouldn't fit in the standard process memory footprint of a VAX at the time.
Care to substantiate that? As far as the Standard is concerned, it does not address packaging issues.
I'm talking about the very, very early days of discussing just what goes in the libraries--and indeed how many libraries there were going to be. The working document we had put almost everything in a single library. When we implemented it as a test case on the two most common systems we had (a PDP-11 and a VAX), the linker couldn't handle the library size. Indeed, without segmenting the program and using overlays, the famous "hello world" program couldn't fit on a VAX. Overlays worked, but it was spectacularly slow.
That's when it got split up into the standard library, the "math" library, the "double precision" library and several others.
This has nothing to do with Standard C. It is a packaging matter, which the Standard does not address. For example, consider this language from the Standard:
[QUOTE MODE ON]
5.2.4.2.2 Characteristics of floating types <float.h>
....
[#4] The accuracy of the floating-point operations (+, -, *, /) and of the library functions in <math.h> and <complex.h> that return floating-point results is implementation defined. The implementation may state that the accuracy is unknown.[QUOTE MODE OFF]
IOW, it addresses where the prototypes exist, but not where the object code exists or how it gets into your program (if, indeed, it ever does, most of my programs wind up using a shared library).
Again, I've not looked at the standard in years. I'm simply relating what went on in the ANSI committee behind the scenes for the two years I was involved.
What the standard has is a distillation of all that went on before. Yes, things are in there that we sweated blood over and in some cases, I don't agree with.
As to what's an "implementation" and what was part of the language per se got blurred over many, many times. "Should we define that math operations of similar precedence be evaluated right-to-left, left-to- right, or leave it up to the implementation?" "In a construct such as '&ptr->object', which is more tightly bound, the '->' or the '&'?" "Do we mandate an RPN-style calculation, algebraic or leave it up to the implemtation?" "Should we implement a multi-level 'break' statement or provide a 'goto' and statement labels?" All of that sort of thing went on, with some stuff defined by the standard and some left to implementation.
As I said, I wasn't there at the end. The money was an issue, but also many of the meetings simply degenerated into finger-pointing sessions. As Charles Shultz once stated in "Peanuts": "Beneath the calm exterior lurk the dynamics of a nursery school recess." All in all, C is a great language, but it does have warts. It's a giraffe...a horse designed by a committee.
Once the concept of splitting up libraries came up, lots of splits were proposed: string handling was going to be in a separate library, network stuff, file management, you name it. Some people actually did implement separate libraries, as the famous Sun network library split shows.
---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens@internap.com - - VitalStream, Inc. http://www.vitalstream.com - - - - "Microsoft is a cross between The Borg and the Ferengi. - - Unfortunately they use Borg to do their marketing and Ferengi to - - do their programming." -- Simon Slavin - ----------------------------------------------------------------------
Rick Stevens wrote:
On Wed, 2007-05-30 at 14:31 -0500, Mike McCarty wrote:
[snip]
As far as the libraries are concerned, the initial draft of what was in the standard C library made the library so damned big that it wouldn't fit in the standard process memory footprint of a VAX at the time.
Care to substantiate that? As far as the Standard is concerned, it does not address packaging issues.
I'm talking about the very, very early days of discussing just what goes in the libraries--and indeed how many libraries there were going to be.
Ok. Your use of the term "standard C library" is somewhat irregular, then. That term now has a specific defined meaning. But if that is what you meant, then fine.
The working document we had put almost everything in a single library. When we implemented it as a test case on the two most common systems we had (a PDP-11 and a VAX), the linker couldn't handle the library size. Indeed, without segmenting the program and using overlays, the famous "hello world" program couldn't fit on a VAX. Overlays worked, but it was spectacularly slow.
One of the first C implementations I used was one on the IBM PC for PCDOS, with a limit of 64K total code+data. Ever try using overlays on a two-floppy system? I have.
Grrr, grrr, grrr, was a sound which frequently came not only from the floppy disc drives, but from other sources as well :-)
[snip]
As I said, I wasn't there at the end. The money was an issue, but also many of the meetings simply degenerated into finger-pointing sessions. As Charles Shultz once stated in "Peanuts": "Beneath the calm exterior lurk the dynamics of a nursery school recess." All in all, C is a great language, but it does have warts. It's a giraffe...a horse designed by a committee.
Plauger and I exchanged a few e-mails during that period, and he made comments of a similar nature. I got to meet him over on comp.lang.c 'way back. I'm sure he doesn't remember me, though.
[snip]
Mike
On Wed, 2007-05-30 at 15:34 -0500, Mike McCarty wrote:
Rick Stevens wrote:
On Wed, 2007-05-30 at 14:31 -0500, Mike McCarty wrote:
[snip]
As far as the libraries are concerned, the initial draft of what was in the standard C library made the library so damned big that it wouldn't fit in the standard process memory footprint of a VAX at the time.
Care to substantiate that? As far as the Standard is concerned, it does not address packaging issues.
I'm talking about the very, very early days of discussing just what goes in the libraries--and indeed how many libraries there were going to be.
Ok. Your use of the term "standard C library" is somewhat irregular, then. That term now has a specific defined meaning. But if that is what you meant, then fine.
The working document we had put almost everything in a single library. When we implemented it as a test case on the two most common systems we had (a PDP-11 and a VAX), the linker couldn't handle the library size. Indeed, without segmenting the program and using overlays, the famous "hello world" program couldn't fit on a VAX. Overlays worked, but it was spectacularly slow.
One of the first C implementations I used was one on the IBM PC for PCDOS, with a limit of 64K total code+data. Ever try using overlays on a two-floppy system? I have.
Grrr, grrr, grrr, was a sound which frequently came not only from the floppy disc drives, but from other sources as well :-)
[snip]
As I said, I wasn't there at the end. The money was an issue, but also many of the meetings simply degenerated into finger-pointing sessions. As Charles Shultz once stated in "Peanuts": "Beneath the calm exterior lurk the dynamics of a nursery school recess." All in all, C is a great language, but it does have warts. It's a giraffe...a horse designed by a committee.
Plauger and I exchanged a few e-mails during that period, and he made comments of a similar nature. I got to meet him over on comp.lang.c 'way back. I'm sure he doesn't remember me, though.
I am somewhat jealous of your getting to know Plauger. I did implement PASCAL from the diagrams in the back of the book, and it took some time to do it. I was working on a CPM based system by then, and only had one single density hardsectored disk drive from Northstar. It took a while. I wrote the byte tiny pascal system and an editor and debugger that would all fit in 64 K without overlaps or reloads. I don't now remember the program size. I do remember that my video board overlaid some of the ram, so I didn't really have 64K, but since it was character based display, it wasn't too much to pay. The disk drive would hold 70K. I still have it, I think. I brought the case (real steel with its own powersupply) into the house thinking I might use it to hold some kind of backup drive. It would power several of today's multihundred gigabyte disks. Don't know if I will attempt it or not. That linear supply is probably quite a power hog. But it sure is built like a sherman tank.
As to the C stuff, thanks, mike, that comment from the standard about the machine may be what mislead me. Also the way that I implemented my original C compiler may have been unique. I just don't remember any more. I do remember the joy in my heart when I did the "Hello World!" program and it really worked. By that time I had bought a nice Morrow Microdecision with two floppy disk drives. Quite the machine at the time. some 5 years later I bought my first PC, a 386 running at I think 35 or 40MHz. And on it I used the very first commercial compiler I had ever had, It was a very nice C package from I think Brown Bag software. How about that for a name from the past.
Now here I sit with two systems at my disposal, and the "archaic" one is 10 times faster with 2000 times as much memory, and the "good one" is dual processor with so much power it boggles my mind. What a difference a few decades makes.
But C is portable. Just in this thread, we have discussed it running on z80, 8080 up to Xeon processors, AMD, PDP, VAX, DEC, and IBM mainframes. What a tribute to Kernigan and Ritchie. Wonderful system, and beautiful to code in. So close to the native machine, and yet so expressive. It was truly inspired. I have used other languages, from BASIC to APL (which is probably the most arcane you can get) and even Cyber systems, but I prefer the simple elegance and efficiency of C whenever possible.
I also remember the big-endian vs little-endian arguments, and some of the consternation that caused. And the arguments about re-entrant code, and variable wipes. C never initializes, so there was a huge contingent that wanted to initialize things, but when this was attempted, it broke some things and made some peoples code even more obscure. I also remember the arguments over self modifying code, and even remember seeing some C code that did just that, and was capable of porting across platforms with the same performance. Try that in assembly. It was and is amazing what a talented mind can do with C.
Regards, Les H
Les wrote:
On Wed, 2007-05-30 at 15:34 -0500, Mike McCarty wrote:
[snip]
Plauger and I exchanged a few e-mails during that period, and he made comments of a similar nature. I got to meet him over on comp.lang.c 'way back. I'm sure he doesn't remember me, though.
I am somewhat jealous of your getting to know Plauger. I did implement
Don't be. We exchanged perhaps eight e-mails (eight from each of us, that is). Chalk that up more to his willingness to exchange emails with people that he doesn't know than anything else. He was very polite and seemed to listen to my comments.
PASCAL from the diagrams in the back of the book, and it took some time to do it. I was working on a CPM based system by then, and only had one single density hardsectored disk drive from Northstar. It took a while.
I bet it did. I ran on a single floppy PCDOS machine for quite a while. I booted and loaded my software into a RAMDISC and then put another floppy in for data. I needed a copy of COMMAND.COM on both discs, so when the CI needed to be reloaded, I didn't have to swap floppy discs.
I implemented my first RTOS using a dual 8" floppy system with an 8085 uP and 64K total RAM+ROM using CP/M. All written in RMAC.
[snip]
expressive. It was truly inspired. I have used other languages, from BASIC to APL (which is probably the most arcane you can get) and even Cyber systems, but I prefer the simple elegance and efficiency of C whenever possible.
First machine I programmed was an IBM 1401 using machine language. Then came an IBM System/360 running APL. Then came a CYBER 6000 using machine language again. Then a Fairchild F8 with machine language.
I also remember the big-endian vs little-endian arguments, and someof the consternation that caused. And the arguments about re-entrant code, and variable wipes. C never initializes, so there was a huge
C doesn't initialize what? It initializes all used variables.
contingent that wanted to initialize things, but when this was attempted, it broke some things and made some peoples code even more obscure. I also remember the arguments over self modifying code, and even remember seeing some C code that did just that, and was capable of
I only wrote one self-modifying program, and then only out of necessity. I needed to use a computed port address for I/O on an 8085, which doesn't have that address mode.
[snip]
Mike
Mike McCarty wrote:
Les wrote:
On Wed, 2007-05-30 at 15:34 -0500, Mike McCarty wrote:
[snip]
Plauger and I exchanged a few e-mails during that period, and he made comments of a similar nature. I got to meet him over on comp.lang.c 'way back. I'm sure he doesn't remember me, though.
I am somewhat jealous of your getting to know Plauger. I did implement
Don't be. We exchanged perhaps eight e-mails (eight from each of us, that is). Chalk that up more to his willingness to exchange emails with people that he doesn't know than anything else. He was very polite and seemed to listen to my comments.
PASCAL from the diagrams in the back of the book, and it took some time to do it. I was working on a CPM based system by then, and only had one single density hardsectored disk drive from Northstar. It took a while.
I bet it did. I ran on a single floppy PCDOS machine for quite a while. I booted and loaded my software into a RAMDISC and then put another floppy in for data. I needed a copy of COMMAND.COM on both discs, so when the CI needed to be reloaded, I didn't have to swap floppy discs.
I implemented my first RTOS using a dual 8" floppy system with an 8085 uP and 64K total RAM+ROM using CP/M. All written in RMAC.
[snip]
expressive. It was truly inspired. I have used other languages, from BASIC to APL (which is probably the most arcane you can get) and even Cyber systems, but I prefer the simple elegance and efficiency of C whenever possible.
First machine I programmed was an IBM 1401 using machine language. Then came an IBM System/360 running APL. Then came a CYBER 6000 using machine language again. Then a Fairchild F8 with machine language.
I also remember the big-endian vs little-endian arguments, and someof the consternation that caused. And the arguments about re-entrant code, and variable wipes. C never initializes, so there was a huge
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
#include <stdio.h>
int main() { int a;
printf("%d\n", a);
return 0; }
$ gcc -Wall test.c $ ./a.out 6874720 $ ./a.out 9766496 $ gcc --version gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12)
-Andy
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
Mike
Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
You're welcome.
I think you have to draw a line between the glorious worlds of possibility left open by the wording of the standard and the grimy reality of the actual compilers. "Automatic variables" will be allocated off the stack frame on any actual compiler that is worth the name.
Though back in the day I did read about a C "compiler" for a tragic arch that neither had a stack nor more than 256 bytes of RAM IIRC... it disallowed recursion or more than 2 function call depth... other than that I propose any compiler will be using a stack frame whether the word is in the standard or not ;-)
-Andy
Andy Green wrote:
Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
You're welcome.
I think you have to draw a line between the glorious worlds of possibility left open by the wording of the standard and the grimy reality of the actual compilers. "Automatic variables" will be allocated off the stack frame on any actual compiler that is worth the name.
I wonder what you mean by "any actual compiler that is worth the name". The 8051 normally uses no stack at all.
Though back in the day I did read about a C "compiler" for a tragic arch that neither had a stack nor more than 256 bytes of RAM IIRC... it disallowed recursion or more than 2 function call depth... other than that I propose any compiler will be using a stack frame whether the word is in the standard or not ;-)
The 8051 has Harvard architecture, giving it 64K of data space and 64K of program space. But it has no real stack, so compilers for it normally don't implement one unless specifically requested so to do. Doing so causes a significant execution penalty. There are LOTS of 8051s (and its progeny) out there.
Mike
On Thu, 2007-05-31 at 17:01 -0500, Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
You're welcome.
I think you have to draw a line between the glorious worlds of possibility left open by the wording of the standard and the grimy reality of the actual compilers. "Automatic variables" will be allocated off the stack frame on any actual compiler that is worth the name.
I wonder what you mean by "any actual compiler that is worth the name". The 8051 normally uses no stack at all.
Uh, I probably shouldn't re-enter this discussion, but I'll throw caution to the winds and put on my fire suit...
While a language such as C is useful on things such as 8051s, it was really designed for general purpose computers. I don't think I'd call an 8051 a "general purpose" processor. IIRC it's a really nice microcontroller--very useful for physical process control and the like along with many other processors of its ilk. My memory on those sorts of things is a big fuzzy, I'll admit.
Though back in the day I did read about a C "compiler" for a tragic arch that neither had a stack nor more than 256 bytes of RAM IIRC... it disallowed recursion or more than 2 function call depth... other than that I propose any compiler will be using a stack frame whether the word is in the standard or not ;-)
The 8051 has Harvard architecture, giving it 64K of data space and 64K of program space. But it has no real stack, so compilers for it normally don't implement one unless specifically requested so to do. Doing so causes a significant execution penalty. There are LOTS of 8051s (and its progeny) out there.
I don't doubt that. I worked with it and its sire, the famous 8048 (ah, a processor with a built-in UVEPROM), their grandfather (the 8008) and their great-grandfather (the 4004). I've done bitslice stuff, too. That was long, long ago in a computer lab far, far away.
One of my favorite old minis was the HP 2100/21MX series. It had a stack, but the first instruction of any subroutine had to be a no-op instruction as that's where the processor stuck the return address of the calling routine. Compilers made that reasonably easy, but back in the day of coding via assembly, you always had to remember that your subroutine had to start with:
label: nop
Lots of weird stuff happened when you forgot to do that. I remember blowing up a pretty damned expensive servo once when I forgot. Life was, uh, interesting back then (late 70s). And no, I didn't "disco".
---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens@internap.com - - VitalStream, Inc. http://www.vitalstream.com - - - - Change is inevitable, except from a vending machine. - ----------------------------------------------------------------------
On Thu, 31 May 2007, Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
For completeness, dynamically allocated memory is not initialized either, unless allocated with calloc() (in which case, it is initialized to zeros) or realloc() (in which case, any added space is not initialized).
Mike
On Thu, 2007-05-31 at 05:52 -0500, Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
Mike
That must be something recent. Also some people believed that it did in the past, but it did not. The variables are just assigned space. The contents of the space was whatever was left from the prior use. Since many developers reboot before starting development, the space is often preset to 0, due to memory checks or perhaps a behavior of some dynamic ram operation, they believed the variables were initiaized, but they were not. This was as of 2000 on a SUN Solaris 8 system using K&R. I know this for a fact. As to how different compilers deal with it, the static variable space is between the initial entry jump and the beginning of code in most designs, and that space may be initialized by the compiler at compile time, such that uninitialized variables are preset to 0. But the original K&R spec did not include initializing variables except explicitly.
This part I am sure of, because I have had to fix many, many peoples code due to this belief. The ANSI comittee may have changed the standard, but I would bet that a lot of older compilers still generate code with no initialization.
Regards, Les H
Les wrote:
On Thu, 2007-05-31 at 05:52 -0500, Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
Mike
That must be something recent. Also some people believed that it did in
IIRC, it was mentioned in K&R v1.
[snip]
This part I am sure of, because I have had to fix many, many peoplescode due to this belief. The ANSI comittee may have changed the standard, but I would bet that a lot of older compilers still generate code with no initialization.
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
Mike
Mike McCarty wrote:
That must be something recent. Also some people believed that it did in
IIRC, it was mentioned in K&R v1.
[snip]
This part I am sure of, because I have had to fix many, many peoplescode due to this belief. The ANSI comittee may have changed the standard, but I would bet that a lot of older compilers still generate code with no initialization.
I sort-of remember some discussion about this back when people were trying to run unix on machines without memory management. Usually the OS would zero blocks of memory before allocating it to a process for security reasons but I don't think C ever guaranteed that. Lint should always have warned you about using uninitialized variables, though.
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
Les Mikesell wrote:
Mike McCarty wrote:
[Les wrote about "older" compilers]
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
I wasn't explicit enough, I guess. I wouldn't characterize 19 years as "older", but rather as "antique" or "ancient" in this context. I was objecting to using too weak a word, not too strong a word :-)
Mike
On Thu, 2007-05-31 at 18:44 -0500, Mike McCarty wrote:
Les Mikesell wrote:
Mike McCarty wrote:
[Les wrote about "older" compilers]
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
I wasn't explicit enough, I guess. I wouldn't characterize 19 years as "older", but rather as "antique" or "ancient" in this context. I was objecting to using too weak a word, not too strong a word :-)
I'm sorry, I don't think of myself as antique or ancient. You assume that all C compilers are ANSI compliant. They are not. K&R was around a long time before the standards committee got involved. And although the standard may have been generated in 1989, Microsoft didn't implement the ANSI standard for several years after that, and then failed some of the standard tests. SUN's compiler was not ANSII compliant until about 1994 or so. There are many compilers out there, and most are probably not fully compliant either. AND I do know that the variables are not initialized in many compilers. I have fixed code for many, many people. I know why their programs failed. I also know that C uses a pushdown stack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times. One case is as has been pointed out here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
Regards, Les H
On Thu, 31 May 2007, Les wrote:
On Thu, 2007-05-31 at 18:44 -0500, Mike McCarty wrote:
Les Mikesell wrote:
Mike McCarty wrote:
[Les wrote about "older" compilers]
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
I wasn't explicit enough, I guess. I wouldn't characterize 19 years as "older", but rather as "antique" or "ancient" in this context. I was objecting to using too weak a word, not too strong a word :-)
I'm sorry, I don't think of myself as antique or ancient. You assume that all C compilers are ANSI compliant. They are not. K&R was around a long time before the standards committee got involved. And although the standard may have been generated in 1989, Microsoft didn't implement the ANSI standard for several years after that, and then failed some of the standard tests. SUN's compiler was not ANSII compliant until about 1994 or so. There are many compilers out there, and most are probably not fully compliant either. AND I do know that the variables are not initialized in many compilers. I have fixed code for many, many people.
If static variables are not initialized, those compilers are broken. K&R woud tell you so even for pre-ANSI/ISO compilers.
I know why their programs failed. I also know that C uses a pushdown
^some particular implementations of
stack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
Of course, several constructs here are undefined, so there is no such thing as "correct" or "incorrect" behavior.
After correcting obvious typos and adding #include <stdio.h> so it would compile, I got (using gcc-4.1.1-51.fc6 with no options):
$ ./a.out 5 4 6 0 0 0 0 0 0
Was that what you were expecting?
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times.
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
One case is as has been pointed outhere, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
In your source code, NULL is *always* written 0 (or sometimes (void *) 0 to indicate that it's intented to stand for a null pointer value, not a NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial. If you type-pun to try to look at the actual machine's representation, your program's behavior is undefined and you deserve what you get. It's the compiler's responsibility to ensure that things work as expected, no matter what the machine's representation is. (For example, '\0' == 0 must return 1.)
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
Pre-ANSI/ISO compilers might have initialized static memory to all-bits-zero even when that was not the correct representation of the default for the type being initialized. ANSI/ISO compilers are not allowed to do that. The required default initializations are well defined. (This is the sort of thing that motivates the creation of standards in the first place.)
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
But usually in the context of discussions about compiler architectures, call stacks are exactly what is meant.
Regards, Les H
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
On Thu, 31 May 2007, Les wrote:
On Thu, 2007-05-31 at 18:44 -0500, Mike McCarty wrote:
Les Mikesell wrote:
Mike McCarty wrote:
[Les wrote about "older" compilers]
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
I wasn't explicit enough, I guess. I wouldn't characterize 19 years as "older", but rather as "antique" or "ancient" in this context. I was objecting to using too weak a word, not too strong a word :-)
I'm sorry, I don't think of myself as antique or ancient. You assume that all C compilers are ANSI compliant. They are not. K&R was around a long time before the standards committee got involved. And although the standard may have been generated in 1989, Microsoft didn't implement the ANSI standard for several years after that, and then failed some of the standard tests. SUN's compiler was not ANSII compliant until about 1994 or so. There are many compilers out there, and most are probably not fully compliant either. AND I do know that the variables are not initialized in many compilers. I have fixed code for many, many people.
If static variables are not initialized, those compilers are broken. K&R woud tell you so even for pre-ANSI/ISO compilers.
I know why their programs failed. I also know that C uses a pushdown
^some particular implementations ofstack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
Of course, several constructs here are undefined, so there is no such thing as "correct" or "incorrect" behavior.
After correcting obvious typos and adding #include <stdio.h> so it would compile, I got (using gcc-4.1.1-51.fc6 with no options):
$ ./a.out 5 4 6 0 0 0 0 0 0
OOPS, forgot to reset the X pointer between the last two print statements. This bit of code is intended to show that globals are on a heap and locals are on a stack.
Was that what you were expecting?
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times.
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
There is a difference between implementation differences and hardware errors, which was the microsoft error. They had a bug in their silicon compiler that caused that IIRC.
One case is as has been pointed outhere, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
In your source code, NULL is *always* written 0 (or sometimes (void *) 0 to indicate that it's intented to stand for a null pointer value, not a NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial. If you type-pun to try to look at the actual machine's representation, your program's behavior is undefined and you deserve what you get. It's the compiler's responsibility to ensure that things work as expected, no matter what the machine's representation is. (For example, '\0' == 0 must return 1.)
'\0' is an escape forcing the 0, so of course this will be equal.
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
Pre-ANSI/ISO compilers might have initialized static memory to all-bits-zero even when that was not the correct representation of the default for the type being initialized. ANSI/ISO compilers are not allowed to do that. The required default initializations are well defined. (This is the sort of thing that motivates the creation of standards in the first place.)
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
But usually in the context of discussions about compiler architectures, call stacks are exactly what is meant.
I am not sure that is true, because in some implementations, the data heap and stack are in the same segment of memory, while the runtime stack for the processor is somewhere else. For high security systems running this should be a requirement. It prevents obvious means of inserting malicious code through variable initialization, and then stack manipulation. I say should be, because it has been tossed around from time to time, but I am unsure if it has ever been formalized.
One system I worked on looked like this: init jump heap variable stack (push down) program entrance program local libraries relocation table symbol table (if not removed) machine stack
Unfortunately I no longer remember which system that was. Just the fact that some standard libraries at that time would not run on it because they did manipulate the stack.
Regards, Les H
Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
[snip]
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
There is a difference between implementation differences and hardware errors, which was the microsoft error. They had a bug in their silicon compiler that caused that IIRC.
Intel != Microsoft
They used a table indexed by the first few bits of the remainder. Part of the table did not get filled in due to a programming error. IIRC, it was the first four bits of the remainder. Anyway, some info is here:
http://www.cs.earlham.edu/~dusko/cs63/fdiv.html
I'd provide you with Intel's white paper URL, but my internet access is having problems at the moment.
[snip]
NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial.
No, the value of the null character is specified...
[QUOTE MODE ON] 5.2.1 Character sets
....
[#2] In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string. [QUOTE MODE OFF]
[snip]
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
Each implementation is its own. There is no fact about all implementation of library routines.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Every conforming implementation of C must make it such that all functions are reentrant and must support recursion. How much work that is on any given hardware depends on how closely the hardware corresponds to the requirements.
[snip]
Mike
On Fri, 1 Jun 2007, Mike McCarty wrote:
Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
[snip]
NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial.
No, the value of the null character is specified...
[QUOTE MODE ON] 5.2.1 Character sets
....
[#2] In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.[QUOTE MODE OFF]
Touche. But my real (snipped) point was about representation of null pointers.
[snip]
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
Each implementation is its own. There is no fact about all implementation of library routines.
It's pretty hard to implement rand() in a re-entrant way, because it has to maintain a static internal state. There are a couple of string.h functions that maintain static internal state information as well, where repeated calls with the same argument string continue scanning the string from where the previous call left off.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Every conforming implementation of C must make it such that all functions are reentrant and must support recursion. How much work that is on any given hardware depends on how closely the hardware corresponds to the requirements.
Not all functions can be made re-entrant.
[snip]
Mike
Matthew Saltzman wrote:
On Fri, 1 Jun 2007, Mike McCarty wrote:
Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
[snip]
NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial.
No, the value of the null character is specified...
[snip Standard quote]
Touche. But my real (snipped) point was about representation of null pointers.
I wasn't trying to score points. I was trying to prevent the disemination of incorrect material. There is already enough floating around about C that 'tain't so. We don't need more. I make a regular habit of checking with the Standard before making statements.
Anyway, that's why I snipped that portion. It was correct.
[snip]
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
Each implementation is its own. There is no fact about all implementation of library routines.
It's pretty hard to implement rand() in a re-entrant way, because it has
I've used a version of rand() which is thread safe. I suspect you may have as well. Does the implementation you customarily use have pthreads? Re-entrancy is a slippery word, as it means so many different things to different people. There are to my knowledge, at least, these possibilities:
Not reusable, may only ever be called once during the lifetime of the program. Examples would be subroutines written on the fly on the execution stack.
Serially reusable, may be called repeatedly, but must not be called while some thread of execution is using it.
Re-entrant, but blocking, may be called while some thread of execution is using it, but it blocks if it is already in use. This type of code may be called from threads, but not from interrupt handlers, for example.
Re-entrant, non-blocking, non-recursive, does not block if called while some thread of execution is already using it, but may not be entered from within itself. This kind of code may not be called in such a manner that it may eventually winds up re-invoking itself from some function it may call. Such functions may not be called from any interrupt handler which may still be running when the next interrupt happens, for example.
Recursive, no restriction on use.
to maintain a static internal state. There are a couple of string.h functions that maintain static internal state information as well, where repeated calls with the same argument string continue scanning the string from where the previous call left off.
These must be duplicated per thread for them to be thread safe. Such implementations exist.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Every conforming implementation of C must make it such that all functions are reentrant and must support recursion. How much work that is on any given hardware depends on how closely the hardware corresponds to the requirements.
Not all functions can be made re-entrant.
All functions may be made re-entrant but blocking.
Mike
On Fri, 1 Jun 2007, Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
I know why their programs failed. I also know that C uses a pushdown
^some particular implementations ofstack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
Of course, several constructs here are undefined, so there is no such thing as "correct" or "incorrect" behavior.
After correcting obvious typos and adding #include <stdio.h> so it would compile, I got (using gcc-4.1.1-51.fc6 with no options):
$ ./a.out 5 4 6 0 0 0 0 0 0OOPS, forgot to reset the X pointer between the last two print statements. This bit of code is intended to show that globals are on a heap and locals are on a stack.
Fixed that. Now I get:
$ ./a.out 5 4 6 0 0 0 0 2 1
But I confess, I don't see how this code proves your point. It does demonstrate that globals are initialized by default, though.
Was that what you were expecting?
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times.
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
There is a difference between implementation differences and hardware errors, which was the microsoft error. They had a bug in their silicon compiler that caused that IIRC.
I could just as easily reference some other obscure compiler bug or implementation-defined behavior and make the same point. The thing about a standard is that there are clear requirements about what is implementation-defined and what is not. Static initialization in ISO C is not one of those implementation-defined things.
I will concede that explicit initializations--even to default values--might be a useful self-documentation tool.
One case is as has been pointed outhere, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
In your source code, NULL is *always* written 0 (or sometimes (void *) 0 to indicate that it's intented to stand for a null pointer value, not a NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial. If you type-pun to try to look at the actual machine's representation, your program's behavior is undefined and you deserve what you get. It's the compiler's responsibility to ensure that things work as expected, no matter what the machine's representation is. (For example, '\0' == 0 must return 1.)
'\0' is an escape forcing the 0, so of course this will be equal.
OK. But the main point is that it doesn't matter what bit pattern represents a null pointer. Your source code will always use the value 0 to represent it. For example,
int *p; /* ...code that sets p... */ if ( p == 0 ) /* *not* if ( p == 0x80000000 ) or if ( p == 0xffffffff ) */ { /* ...handle null pointer value... */ }
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
Pre-ANSI/ISO compilers might have initialized static memory to all-bits-zero even when that was not the correct representation of the default for the type being initialized. ANSI/ISO compilers are not allowed to do that. The required default initializations are well defined. (This is the sort of thing that motivates the creation of standards in the first place.)
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Sure, you need to know that you can compile re-entrant code if you need it.
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
But usually in the context of discussions about compiler architectures, call stacks are exactly what is meant.
I am not sure that is true, because in some implementations, the data heap and stack are in the same segment of memory, while the runtime stack for the processor is somewhere else. For high security systems running this should be a requirement. It prevents obvious means of inserting malicious code through variable initialization, and then stack manipulation. I say should be, because it has been tossed around from time to time, but I am unsure if it has ever been formalized.
One system I worked on looked like this: init jump heap variable stack (push down) program entrance program local libraries relocation table symbol table (if not removed) machine stack
Unfortunately I no longer remember which system that was. Just the fact that some standard libraries at that time would not run on it because they did manipulate the stack.
Regards, Les H
On Fri, 2007-06-01 at 19:02 -0400, Matthew Saltzman wrote:
On Fri, 1 Jun 2007, Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
I know why their programs failed. I also know that C uses a pushdown
^some particular implementations ofstack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
Of course, several constructs here are undefined, so there is no such thing as "correct" or "incorrect" behavior.
After correcting obvious typos and adding #include <stdio.h> so it would compile, I got (using gcc-4.1.1-51.fc6 with no options):
$ ./a.out 5 4 6 0 0 0 0 0 0OOPS, forgot to reset the X pointer between the last two print statements. This bit of code is intended to show that globals are on a heap and locals are on a stack.
Fixed that. Now I get:
$ ./a.out 5 4 6 0 0 0 0 2 1
But I confess, I don't see how this code proves your point. It does demonstrate that globals are initialized by default, though.
Actually, it doesn't. And this is the problem. Many people assume that because they obtained 0 one time, that the value was set in memory by some behind the scenes action of the compiler. In fact the memory could have been set by any of a number of actions. Some memory chips start with all data zero'ed (at the output, at the physical layer the construction is designed to minimize current drain and transitions, but that is another topic entirely.) In that case, if power had been off all memory not explicitly set would be zero by default. Another situation is when a memory checker runs, and leaves memory in a zero state (most do by design). Thus if the compiler doesn't initialize memory, and the memory where the code is placed has not been used in a prior run, the variable space will be zero. But if the program is deleted, and the memory filled with a nonzero pattern, and the code reloaded and compiled, the result may be much different, and can cause the program to crash. When the program is saved to disk as an executable, the memory pattern that is saved is the last state of the code, whatever that was, and depending on how the code development system saves the code, the variables may or may not be set to zero at save time. At load time, the memory will be initialized according to the data in the executable file.
So, while the compiler may initialize the variables, there are other issues that can impact the true state at run time, and therefore default state should not be relied on as the condition. After all, you create a variable to store information, don't you? Why would you not iinitialize it? Anyway, while this has been a good discussion, I hope that you have begun to realize that all is not just in the compiler, but in the implementation, in the memory of the system, and in the methods of implementing and running code. And by the way, Matthew, this is in no way critizing you. I have heard of you before, and will probably hear great things from you in the future.
Good luck, and good fortune.
Was that what you were expecting?
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times.
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
There is a difference between implementation differences and hardware errors, which was the microsoft error. They had a bug in their silicon compiler that caused that IIRC.
I misspoke here, and said Microsoft, when I meant Intel.
I could just as easily reference some other obscure compiler bug or implementation-defined behavior and make the same point. The thing about a standard is that there are clear requirements about what is implementation-defined and what is not. Static initialization in ISO C is not one of those implementation-defined things.
I will concede that explicit initializations--even to default values--might be a useful self-documentation tool.
One case is as has been pointed outhere, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
In your source code, NULL is *always* written 0 (or sometimes (void *) 0 to indicate that it's intented to stand for a null pointer value, not a NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial. If you type-pun to try to look at the actual machine's representation, your program's behavior is undefined and you deserve what you get. It's the compiler's responsibility to ensure that things work as expected, no matter what the machine's representation is. (For example, '\0' == 0 must return 1.)
'\0' is an escape forcing the 0, so of course this will be equal.
OK. But the main point is that it doesn't matter what bit pattern represents a null pointer. Your source code will always use the value 0 to represent it. For example,
int *p; /* ...code that sets p... */ if ( p == 0 ) /* *not* if ( p == 0x80000000 ) or if ( p == 0xffffffff ) */ { /* ...handle null pointer value... */ }
Actually this is one of the problem areas. 0 is an explicit, and is actully zero. Only if using c++ and equality is overloaded for pointers will this work. Otherwise the actual contents of p will be used to compare to 0 and that will fail in some systems. Some compilers may deal with it as you expect, but I have not used one that did.
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
Pre-ANSI/ISO compilers might have initialized static memory to all-bits-zero even when that was not the correct representation of the default for the type being initialized. ANSI/ISO compilers are not allowed to do that. The required default initializations are well defined. (This is the sort of thing that motivates the creation of standards in the first place.)
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Sure, you need to know that you can compile re-entrant code if you need it.
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
But usually in the context of discussions about compiler architectures, call stacks are exactly what is meant.
I am not sure that is true, because in some implementations, the data heap and stack are in the same segment of memory, while the runtime stack for the processor is somewhere else. For high security systems running this should be a requirement. It prevents obvious means of inserting malicious code through variable initialization, and then stack manipulation. I say should be, because it has been tossed around from time to time, but I am unsure if it has ever been formalized.
One system I worked on looked like this: init jump heap variable stack (push down) program entrance program local libraries relocation table symbol table (if not removed) machine stack
Unfortunately I no longer remember which system that was. Just the fact that some standard libraries at that time would not run on it because they did manipulate the stack.
Regards, Les H
I have said all that I know. I hope it helps you all in the future. C is wonderful, compact, close to the machine, and a good language, capable of expressing many many complex concepts. I am sure there are other languages out there, and I have used a few, but I love C.
Regards, Les H
Sorry, readers, this is getting rather long and still pretty OT. Les, if you want to continue, perhaps we should take it offline?
Comments insterspersed throughout.
On Sun, 3 Jun 2007, Les wrote:
On Fri, 2007-06-01 at 19:02 -0400, Matthew Saltzman wrote:
On Fri, 1 Jun 2007, Les wrote:
On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
I know why their programs failed. I also know that C uses a pushdown
^some particular implementations ofstack for variables in subroutines. You can check it out with a very simple program using pointers:
#include <sttlib.h>
int i,j,k;
main() { int mi,mj,mk; int *x; mi=4;mj=5;mk=6; x=&mk; printf ("%d %d %d\n",*x++,*X++;*X++); x=&i; printf ("%d %d %d\n",*x++,*x++,*x++); i-1;j=2;k=3; printf ("%d %d %d\n",*x++,*x++,*x++); )
Just an exercise you understand. compile and run this with several c packages, or if the package you choose supports it, have it compile K&R. and try it.
Of course, several constructs here are undefined, so there is no such thing as "correct" or "incorrect" behavior.
After correcting obvious typos and adding #include <stdio.h> so it would compile, I got (using gcc-4.1.1-51.fc6 with no options):
$ ./a.out 5 4 6 0 0 0 0 0 0OOPS, forgot to reset the X pointer between the last two print statements. This bit of code is intended to show that globals are on a heap and locals are on a stack.
Fixed that. Now I get:
$ ./a.out 5 4 6 0 0 0 0 2 1
But I confess, I don't see how this code proves your point. It does demonstrate that globals are initialized by default, though.
Actually, it doesn't. And this is the problem. Many people assume that
Note I said "demonstrate", not "prove". For a math teacher, there's an important distinction 8^).
because they obtained 0 one time, that the value was set in memory by some behind the scenes action of the compiler. In fact the memory could have been set by any of a number of actions. Some memory chips start with all data zero'ed (at the output, at the physical layer the construction is designed to minimize current drain and transitions, but that is another topic entirely.) In that case, if power had been off all memory not explicitly set would be zero by default. Another situation is when a memory checker runs, and leaves memory in a zero state (most do by design). Thus if the compiler doesn't initialize
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
memory, and the memory where the code is placed has not been used in a
^^^^^^
But this is the key: In the absence of an explicit initializer, an ISO-compliant compiler *must* generate code to properly initialize static memory (not automatic or dynamic memory) just as if the default initializer had been provided explicitly.
Proper initialization means that floats and doubles must be initialized to 0.0 and pointers must be initialized to the null pointer value, even if those bit patterns differ from all-bits-zero. (calloc() must initialize its memory to all-bits-zero.)
If you don't believe me, how about the Usenet News comp.lang.c FAQ? See http://c-faq.com/decl/index.html for a general discussion of allocation and initialization, but pay particular attention to http://c-faq.com/decl/initval.html: ----------------------------------- comp.lang.c FAQ list Question 1.30
Q: What am I allowed to assume about the initial values of variables and arrays which are not explicitly initialized? If global variables start out as ``zero'', is that good enough for null pointers and floating-point zeroes?
A: Uninitialized variables with static duration (that is, those declared outside of functions, and those declared with the storage class static), are guaranteed to start out as zero, just as if the programmer had typed ``= 0'' or ``= {0}''. Therefore, such variables are implicitly initialized to the null pointer (of the correct type; see also section 5) if they are pointers, and to 0.0 if they are floating-point. [1]
Variables with automatic duration (i.e. local variables without the static storage class) start out containing garbage, unless they are explicitly initialized. (Nothing useful can be predicted about the garbage.) If they do have initializers, they are initialized each time the function is called (or, for variables local to inner blocks, each time the block is entered at the top[2] ).
These rules do apply to arrays and structures (termed aggregates); arrays and structures are considered ``variables'' as far as initialization is concerned. When an automatic array or structure has a partial initializer, the remainder is initialized to 0, just as for statics. [3] See also question 1.31.
Finally, dynamically-allocated memory obtained with malloc and realloc is likely to contain garbage, and must be initialized by the calling program, as appropriate. Memory obtained with calloc is all-bits-0, but this is not necessarily useful for pointer or floating-point values (see question 7.31, and section 5).
References: K&R1 Sec. 4.9 pp. 82-4 K&R2 Sec. 4.9 pp. 85-86 ISO Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3 H&S Sec. 4.2.8 pp. 72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96, Sec. 16.1 p. 386
[1] This requirement means that compilers and linkers on machines which use nonzero internal representations for null pointers or floating-point zeroes cannot necessarily make use of uninitialized, 0-filled memory, but must emit explicit initializers for these values (rather as if the programmer had).
[2] Initializers are not effective if you jump into the middle of a block, either with a goto or a switch. Initializers are therefore never effective on variables declared in the main block of a switch statement.
[3] Early printings of K&R2 incorrectly stated that partially-initialized automatic aggregates were filled out with garbage. -----------------------------------
prior run, the variable space will be zero. But if the program is deleted, and the memory filled with a nonzero pattern, and the code reloaded and compiled, the result may be much different, and can cause the program to crash. When the program is saved to disk as an executable, the memory pattern that is saved is the last state of the code, whatever that was, and depending on how the code development system saves the code, the variables may or may not be set to zero at save time. At load time, the memory will be initialized according to the data in the executable file.
So, while the compiler may initialize the variables, there are other issues that can impact the true state at run time, and therefore default state should not be relied on as the condition.
Yes, I know all this. I've been programming since the 1960s and writing C since the 1980s.
After all, you create avariable to store information, don't you? Why would you not iinitialize it?
As I said, even if you are guaranteed that initialization will take place, it can't hurt and might help readability to do it explicitly anyway.
Anyway, while this has been a good discussion, I hope that you havebegun to realize that all is not just in the compiler, but in the implementation, in the memory of the system, and in the methods of implementing and running code.
Sure. But in this case, the compiler's guarantee trumps all that.
And by the way, Matthew, this is in noway critizing you. I have heard of you before, and will probably hear great things from you in the future.
Good luck, and good fortune.
And the same to you, sir.
Was that what you were expecting?
But you still haven't answered this question, nor explained how your code demonstrates the difference between "the heap" and "the stack".
I cannot vouch for every compiler, only Microsoft, Sun, and Instant C off the top of my head. I have used a few other packages as well. But any really good programmer NEVER relies on system initialization. It is destined to fail you at bad times.
How much effort are you willing to expend to defend against potentially buggy compilers (as opposed to undefined or implementation-defined behaviors)? The Intel fdiv bug would seem to prove that you should NEVER rely on arithmetic instructions to provide the correct answer. There's an economic tradeoff between protecting yourself from all conceivable errors and actually getting work done.
There is a difference between implementation differences and hardware errors, which was the microsoft error. They had a bug in their silicon compiler that caused that IIRC.
I misspoke here, and said Microsoft, when I meant Intel.
I could just as easily reference some other obscure compiler bug or implementation-defined behavior and make the same point. The thing about a standard is that there are clear requirements about what is implementation-defined and what is not. Static initialization in ISO C is not one of those implementation-defined things.
I will concede that explicit initializations--even to default values--might be a useful self-documentation tool.
One case is as has been pointed outhere, that NULL is sometimes 0, sometimes 0x80000000, and sometimes 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending on the implementation. But strings always end in a character NULL or 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers. They may do otherwise on some others. It can byte (;-) you if you are not careful.
In your source code, NULL is *always* written 0 (or sometimes (void *) 0 to indicate that it's intented to stand for a null pointer value, not a NUL character value). The string terminator character is *always* written '\0'. The machine's representation of that value is immaterial. If you type-pun to try to look at the actual machine's representation, your program's behavior is undefined and you deserve what you get. It's the compiler's responsibility to ensure that things work as expected, no matter what the machine's representation is. (For example, '\0' == 0 must return 1.)
'\0' is an escape forcing the 0, so of course this will be equal.
OK. But the main point is that it doesn't matter what bit pattern represents a null pointer. Your source code will always use the value 0 to represent it. For example,
int *p; /* ...code that sets p... */ if ( p == 0 ) /* *not* if ( p == 0x80000000 ) or if ( p == 0xffffffff ) */ { /* ...handle null pointer value... */ }
Actually this is one of the problem areas. 0 is an explicit, and is actully zero. Only if using c++ and equality is overloaded for pointers will this work. Otherwise the actual contents of p will be used to compare to 0 and that will fail in some systems. Some compilers may deal with it as you expect, but I have not used one that did.
No, I may have been mistaken about ints and chars, but in a pointer context, 0 means a null pointer, whatever bit pattern represents it, and an ISO-compliant compiler *must* do the right thing. Again, the comp.lang.c FAQ covers null pointers in great detail (http://c-faq.com/null/index.html), but in particular, there's this (http://c-faq.com/null/machnon0.html):
---------------------------------- comp.lang.c FAQ list Question 5.5
Q: How should NULL be defined on a machine which uses a nonzero bit pattern as the internal representation of a null pointer?
A: The same as on any other machine: as 0 (or some version of 0; see question 5.4).
Whenever a programmer requests a null pointer, either by writing ``0'' or ``NULL'', it is the compiler's responsibility to generate whatever bit pattern the machine uses for that null pointer. (Again, the compiler can tell that an unadorned 0 requests a null pointer when the 0 is in a pointer context; see question 5.2.) Therefore, #defining NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other: the compiler must always be able to generate the machine's correct null pointers in response to unadorned 0's seen in pointer contexts. A constant 0 is a null pointer constant; NULL is just a convenient name for it (see also question 5.13).
(Section 4.1.5 of the C Standard states that NULL ``expands to an implementation-defined null pointer constant,'' which means that the implementation gets to choose which form of 0 to use and whether to use a void * cast; see questions 5.6 and 5.7. ``Implementation-defined'' here does not mean that NULL might be #defined to match some implementation-specific nonzero internal null pointer value.)
See also questions 5.2, 5.10 and 5.17.
References: ISO Sec. 7.1.6 Rationale Sec. 4.1.5 ----------------------------------
And since that is so, how are those variables initialized? and to what value? What is a pointer set to when it is intialized. Hint, on Cyber the supposed default for assigned pointers used to the the address of the pointer. Again, system dependencies may get you.
Pre-ANSI/ISO compilers might have initialized static memory to all-bits-zero even when that was not the correct representation of the default for the type being initialized. ANSI/ISO compilers are not allowed to do that. The required default initializations are well defined. (This is the sort of thing that motivates the creation of standards in the first place.)
And those systems that used the first location to store the return address are not re-entrant, without other supporting code in the background. I think I used one of those once as well.
There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact several standard library routines are known to not be re-entrant.
This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).
Sure, you need to know that you can compile re-entrant code if you need it.
PS. A stack doesn't necessarily mean a processor call and return stack. It is any mechanism of memory address where the data is applied to the current location, then the pointer incremented (or decremented depending on the architecture).
But usually in the context of discussions about compiler architectures, call stacks are exactly what is meant.
I am not sure that is true, because in some implementations, the data heap and stack are in the same segment of memory, while the runtime stack for the processor is somewhere else. For high security systems running this should be a requirement. It prevents obvious means of inserting malicious code through variable initialization, and then stack manipulation. I say should be, because it has been tossed around from time to time, but I am unsure if it has ever been formalized.
One system I worked on looked like this: init jump heap variable stack (push down) program entrance program local libraries relocation table symbol table (if not removed) machine stack
Unfortunately I no longer remember which system that was. Just the fact that some standard libraries at that time would not run on it because they did manipulate the stack.
Regards, Les H
I have said all that I know. I hope it helps you all in the future. C is wonderful, compact, close to the machine, and a good language, capable of expressing many many complex concepts. I am sure there are other languages out there, and I have used a few, but I love C.
Hear, hear! But like any complex language, it is not without its subtleties.
Regards, Les H
Matthew Saltzman wrote:
Sorry, readers, this is getting rather long and still pretty OT. Les, if you want to continue, perhaps we should take it offline?
[snip]
Note I said "demonstrate", not "prove". For a math teacher, there's an important distinction 8^).
Sure is!
[snip]
state (most do by design). Thus if the compiler doesn't initialize
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^memory, and the memory where the code is placed has not been used in a
^^^^^^
But this is the key: In the absence of an explicit initializer, an ISO-compliant compiler *must* generate code to properly initialize static memory (not automatic or dynamic memory) just as if the default initializer had been provided explicitly.
Umm, more correctly, the implementation must provide a way for that to happen. This code may or may not be generated by the compiler, and it may or may not be part of the program. It may, for example, be part of the program loader.
[snip]
If you don't believe me, how about the Usenet News comp.lang.c FAQ?
This is irrelevant. What is relevant is the Standard.
[snip]
As I said, even if you are guaranteed that initialization will take place, it can't hurt and might help readability to do it explicitly anyway.
It might. It might also mean that a program which runs in a non-conforming implementation could restart more quickly :-)
It also might make the program too big to fit into memory.
Mike
Les wrote:
On Fri, 2007-06-01 at 19:02 -0400, Matthew Saltzman wrote:
[snip]
Maybe I have misunderstood you. Are you saying that the compiler alone is not enough? If so, then I agree with you. However, if you mean that the initial state of statically allocated variables is not known at program start up, then I have to say that I believe you are incorrect.
But I confess, I don't see how this code proves your point. It does demonstrate that globals are initialized by default, though.
Actually, it doesn't. And this is the problem. Many people assume that because they obtained 0 one time, that the value was set in memory by some behind the scenes action of the compiler. In fact the memory could
Not the compiler, the implementation. The implementation is required by the Standard to initialize all statically allocated variables to certain values before invoking the designated C function (usually main(.,.)).
[QUOTE MODE ON] 5.1.2 Execution environments
[#1] Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment. All objects in static storage shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified. Program termination returns control to the execution environment. [QUOTE MODE OFF]
have been set by any of a number of actions. Some memory chips start with all data zero'ed (at the output, at the physical layer the construction is designed to minimize current drain and transitions, but that is another topic entirely.) In that case, if power had been off all memory not explicitly set would be zero by default. Another
This is irrelevant. Les, I've supported real time systems that behave as you describe, and try to "restart" rather than "start again" in the interests of increasing availability. That's useful, but non-conforming behavior.
In fact, one system I supported had three levels of startup: initial (reload all programs and start them from scratch), hard restart (using the program images in memory, restart them from scratch), and soft restart (restart the initial task in the programs, but do not attempt to re-initialize everything). The restarts were escalated from soft to hard to initial. About 50% of the time soft restarts would work, and about 98% of the time hard restarts would work. Some programs simply were marked "do not attempt restart" as they did significant H/W initialization, and could not hope to disentangle the current state. A system with such a program simply never tried anything but hard restarts and initial restarts.
situation is when a memory checker runs, and leaves memory in a zero state (most do by design). Thus if the compiler doesn't initialize memory, and the memory where the code is placed has not been used in a prior run, the variable space will be zero. But if the program is
What you've just proven is that the implementation must do some work before invoking the designated C function. I've worked on such systems. In fact, you might look into the meaning of USS and BSS.
[snip]
So, while the compiler may initialize the variables, there are otherissues that can impact the true state at run time, and therefore default state should not be relied on as the condition. After all, you create a
The Standard is (intended to be) written such that what you describe cannot happen.
variable to store information, don't you? Why would you not iinitialize it? Anyway, while this has been a good discussion, I hope that you have
Because the implementation already initializes it for you. There are two ways:
SomeType_t SomeVar = SomeValue; AnotherType_t AnotherVar;
The implementation "knows" of efficient ways on the target architecture to accomplish what you want. In the first place, it probably puts all statically allocated variable into a special region, and then uses a block copy instruction to copy the entire initial values all in one fell swoop. In the second, it probably puts the statically allocated variable into a region whose size is simply noted in the object code, and this region is initialized by using some sort of fill instruction.
Mike
On Thu, 31 May 2007, Les Mikesell wrote:
C was old before ANSI came along. Maybe we could revive the discussion of why "abcd"[2] must evaluate to 'c'.
Or for that matter, why 2["abcd"] == 'c' too.
On Thu, 31 May 2007, Mike McCarty wrote:
Les wrote:
On Thu, 2007-05-31 at 05:52 -0500, Mike McCarty wrote:
Andy Green wrote:
Mike McCarty wrote:
[snip]
C doesn't initialize what? It initializes all used variables.
Not if they're on the stack. You should get a compiler warning nowadays... but don't count on it!
Erm, C knows nothing about a "stack". However, it is true that automatic variable are not necessarily initialized. I should have stated that all statically allocated variables are initialized. Thanks for the correction.
Mike
That must be something recent. Also some people believed that it did in
IIRC, it was mentioned in K&R v1.
My K&R1 is lost to history 8^(, but my H&S 3rd edition spent a lot of space on the difference between "traditional" or "K&R" and ANSI compilers. In the section on default initialization the only static vars that were *not* required to be initialized in traditional compilers were unions. The note says that some traditional compilers would not permit such initializations. The ANSI requirement is that the union be initialized to the default for the first element.
The other note was that some traditional compilers would initialize a static float/double to all-bits-zero, even if that was not the representation of 0.0. That's not permitted in ANSI C.
[snip]
This part I am sure of, because I have had to fix many, many peoplescode due to this belief. The ANSI comittee may have changed the standard, but I would bet that a lot of older compilers still generate code with no initialization.
Older? ANSI C is since 1989. I guess one could characterize 19 years as "older". :-)
And those older-than-dirt compilers were even broken for their age if that's the way they behaved.
Mike
Matthew Saltzman wrote:
On Thu, 31 May 2007, Mike McCarty wrote:
[in regards to initializing statically allocated variables]
IIRC, it was mentioned in K&R v1.
My K&R1 is lost to history 8^(, but my H&S 3rd edition spent a lot of
So is mine :-(
space on the difference between "traditional" or "K&R" and ANSI compilers. In the section on default initialization the only static vars that were *not* required to be initialized in traditional compilers were unions. The note says that some traditional compilers would not permit such initializations. The ANSI requirement is that the union be initialized to the default for the first element.
The other note was that some traditional compilers would initialize a static float/double to all-bits-zero, even if that was not the representation of 0.0. That's not permitted in ANSI C.
Thank you for the clarification. Also not permitted in ANSI C is the initialization of pointers to all-bits-zero if that is not (one of the permitted) values of NULL.
Mike
On Wed, 30 May 2007, Rick Stevens wrote:
Once the concept of splitting up libraries came up, lots of splits were proposed: string handling was going to be in a separate library, network stuff, file management, you name it. Some people actually did implement separate libraries, as the famous Sun network library split shows.
Nyet. A *camel* is a horse designed by a committee. A giraffe is a leopard designed by a committee.
Chris Schumann wrote:
Late reply; sorry.
Date: Thu, 24 May 2007 08:43:26 -0700 From: Les hlhowell@pacbell.net
Embedded applications today are mostly 8 bit, but many, many designers have already begun the transition to 16 bit, and soon will be moving to 32 bit. The reasons are much the same as the reasons that general computing has moved from 8 to 16 to 32 and now to 64, with the cutting edge already looking at 128 bit and parallel processing, along with dedicated processors running 32 or 64 bit floating point math. Also the length of the integer used in C, which is a virtual machine is independent of the word length of the processor, except the C language designers (originally Kernigan and Ritchie) made the language somewhat flexible to simplify migration. That is why there were some undefined situations in the original specification. Remember that C is a virtual machine language, whose processor only has 24 instructions (I think the Ansi committee added a couple, but they have specific uses that were not foreseen in the original usage of the language) It can be ported to any machine currently extant by only writing about 1K of machine code, and even that can be done in another available higher level language if you so desire, as long as it is compiled for efficiency.
Having used C since the original K&R version, I have to ask WHAT?!?
Since when is C a virtual machine language?
I believe that the reference is to this language from the Standard:
5.1.2.3 Program execution
[#1] The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.
[snip]
(and because it is a virtual machine language...)
That is why even the 8 bit implementations of C used a 16 bit integer.
No it's not. They used 16 bit integers because you can't do much
They used 16 bit integers because the original compiler was for the PDP series of machines. The architecture of that machine influenced several aspects of the language. As the compiler evolved, some of the architectural aspects of that were removed, but not all by any means.
of anything useful with only 8 bit integers. The compiler designers for those systems (like the Apple II) had to work around the 8 bit registers. Looking at the assembly-language source for some of the libraries was not pleasant.
The argument goes the wrong way. The reason the PDP series used 16 bit integers is because not much can be done with 8 bit integers. This is what influenced the compiler.
Mike