The back end and libraries are now up for general testing on a variety of boxes and operating systems. The tarball is usually updated as the web page is. To download and install g95 on unix systems, run the following command (originally from Joost Vandevondele):

wget -O - http://ftp.g95.org/g95-x86-
linux.tgz | tar xvfz -

This will create a directory named 'g95-install' in the current directory. Run (or better yet make an appropriate symbolic link to) ./g95-
install/bin/i686-pc-linux-gnu-
g95 in order to run g95.

2000 Archive

December 31

Lots of small fixes. Fixed the problem with functions introduced last time. The problem has to do with creating only one symbol node for containted module functions. In the contained module the function is a variable and in the module namespace the function is really a function.

Another problem was the parsing of a function call of the form f(a==b). The argument-matcher wasn't discarding the 'a=' like it should have.

Fixed a problem with the formal arguments of statement functions being declared as 'SAVE'. Solved the problem by not applying the 'DUMMY' attribute to entities in these arguments.

December 20

Looks like the fix to the function declaration parsers had some problems. No time to fix it today, as I am preparing to head home for Christmas. My brother has a DSL connection, so I may be able to do some stuff done, but I'm not going to count on a lot. I should be back in Arizona around the 30th.

December 19

Last night's fix fixes loads of things-- LAPACK now mostly compiles. Fixed another problem-- Generic names have to be given the 'function' or 'subroutine' attribute, depending on the type of procedures that their interface contains. Also added error messages for generic interfaces that contain a mixture of subroutines and functions.

A while back, Michael Richmond reported a problem having to do with what attributes a function has in it's own program unit. Unless a result variable is specified, the function name is really a 'variable'. This initially led me to an important fix in g95_match_actual_arglist()-- Actual argument lists are the one place where procedure names can be used as expressions of a sort, with a typespec of BT_PROCEDURE.

Anyhow, the problem appears fixed.

December 18

Fixed the function g95_match_optional(), which was adding the 'intrinsic' attribute. This was causing problems in a module created by the LAPACK library that preventing compiling everything else in that directory.

December 17

Fixed a few problems left over from yesterday that I missed-- this included trying to do numeric operations on expression nodes that didn't even represent constants.

Fixed a problem in the matching of structure constructors-- the components weren't being copied if they came from a parent namespace.

g95_charlen structures are now written and read in modules.

December 16

Finished the modifications and hooked the new expression subroutines into the expression parser. As far as I've tested, things are back to where they were. I've checked in the changes to CVS in anticipation of the automatic test suite running later tonight. Sourceforge has upgraded their hardware, and secure-copy still doesn't quite work...

I've also made some modifications to the contributions page to reflect the FSF's new assignment procedures.

December 15

Not much time tonight, but more work on the overhaul. The only things left to do are to remove the current scheme for simplifying expressions and hook the new subroutines into the main expression parser. It looks like this new scheme is going to be even more general than I had previously thought-- the stuff I am doing now will also have to handle simplification of intrinsic expressions, even those with array arguments.

December 13

Worked more on the arithemetic overhaul. Almost ready to start testing, then I can delete some cruft that has been around since day one.

I did get an answer from the FSF regarding their new assignment procedures and have mailed Walter back-- he was thinking about implementing the overlap checking for SELECT CASE statements. I'll try to update the web page sometime tommorrow. I am going skiing, so it may not happen...

December 12

Whoa! A whole week just slipped by! What can I say-- I've been busy, trying to get things done that have to get done. Walter Silvestri let me know that the link to the copyright assignment forms is broken. As far as I can tell, the forms are no longer available on the web.

While sane people would wonder why an organization devoted to free software would hide the legal forms needed to contribute to that organization, the FSF is a more than a little given to elitest lunacy. For example, check this out, third paragraph.

Anyone who gets upset about personal pronouns having a gender probably has other serious problems as well.

Anyhow, I am working on the situation re assignment forms.

G95 stuff tonight consisted of writing down a list of restrictions that the F languages places on fortran. This is part of a revamping of the contributions page which has been allowed to languish too long. Walt Brainerd sent me a list of these a while back, which I lost, but a more comprehensive list was on his website.

December 5

Still rearranging things within arith.c. The changes I've been working on basically reorder the simplification of expressions. It is not that important to be able to simplify every possible expression-- the back end takes care of collapsing constants in different parts of an expression tree and there is no sense duplicating this in the front end.

One thing that the front end has to be able to do is to reduce an expression that is composed completely of constants into the right constant at compile time-- to determine the value of a PARAMETER, for example.

In the previous version, the expression parser built a tree of expression nodes. Later, a recursive simplification function was called that could reduce a constant expression to the right constant. In an initialization expression, function references bind to intrinsics automatically unlike function references anywhere else. The simplification function had a flag to attack functions in this manner.

The new versions switches the order a bit. When the expression parser has two summands that it needs to combine, it will call

g95_add(op1, op2)  

which will return a new node that is the sum of the two nodes. If the two nodes happen to be constant, the new node will be the arithmetic sum of op1 and op2. Later on, if the expression is an initialization expression, a stripped-down version of the current simplification function will call the right intrinsic function handler to do its thing.

While it sounds like an aesthetic rearrangement, this will make arithmetic a lot easier to do, particularly within the functions that reduce intrinsic expressions at compile time. As Katherine Holcomb found out, the g95_arith_* functions are a pain in the butt to use.

One particularly painful place is array constants created from array constructors. Under this scheme, you just call the g95_add() and it notices that it is dealing with arrays and calls g95_arith_plus() repeatedly to do its job. Under the new scheme, expressions can be constructed and reduced using the same functions.

While I've been planning this for a while, the real impetus was noticing that the bulk of the problems in the test suite are a failure to reduce initialization expressions.

December 4

More work on the expression node overhaul. The changes aren't that major, but reflect some fundamental changes to parts of g95 that have been around since the early days when it started out as an expression parser. More explanations later-- it is quite late.

December 2

Andreas Schweitzer reported an internal error associated with freeing the IOLENGTH form of the INQUIRE statement. The problem has been fixed.

Rob Cermak pointed out that module file are being written from modules that have errors. This is a very bad idea, since it will fool the 'make' program into thinking that a source file does not need to be recompiled. I've changed things so that a module file is not written in this case and a previously existing module file is deleted.

Started overhauling the expression handling to deal with array constants and make intrinsic arithmetic easier.

November 30

Rob Cermak reported that last night's fixes to the KIND intrinsic caused a huge jump in the number of source files sucessfully parsed by g95. Mark Dewing posted the URLs of a couple of perl tools that can be used to create makefiles by reading directories of fortran 90 source files. Hopefully this will improve things even more.

I came up with a third way of last night's dilemma. Instead, I've opted for simply copying the component lists of derived types when the proper type is in a parent program unit. This in effect defines a separate but equal type. Added a couple more diagnostics to the parsing of derived types-- now, you can only define a type once!

Also added command-line options -ffree-form and -ffixed-form. This came up on the mail list the other day. They cause the source file to be parsed as fixed or free form without regard to the filename extension.

November 29

Fixed several minor problems found by the test suites. These included a core dump in the KIND() intrinsic. I've also changed structure I/O to generate placeholder code instead of an internal error. Generating code to print structures is going to have to wait until there is more machinery for generating code...

The major dilemma of the night was the forward referencing of types-- There was a special case to allow functions to be declared of a derived type before the type was defined. It turns out this special case is actually the rule. Variables of a derived type can be declared before the type itself.

With host program units this creates a problem. If a derived type variable is declared, then what does it refer to? If the type is defined later, then that is what is used. If no type is defined, then it takes the type defined in a parent program unit. We've got to create the symbol node for the type when it is first used, since those typespec structures have to point to something. But it might turn out later to be wrong.

Two possibilities suggest themselves. Either we make a pass through the entire namespace when the first non-declaration statement is encountered and update typespec structures to point to the right thing, or we write an accessor function that returns a symbol node given a derived type typespec node.

At the moment, I'm kind of leaning toward the function or maybe just inlining something, since there aren't too many places that need to go from typespec stucture to symbol node-- match_varspec() is the notable exception.

November 26

Made changes on where procedure names are stored-- they are now generally stored in parent namespaces. Debugged this and things look good-- the RK code now parses without any errors at all.

Erik Schnetter reported a problem with name matching-- only 30 characters were matched instead of 31.

November 23

Debugged the saving and loading of interfaces within a module. It appears to work. The real depressing thing about working on this is the realization that almost no one will use these dark corners of fortran 95...

November 22

Added private and public attributes to operator interfaces. These control whether these definitions are exported to a module or not. I also changed the public and private bits in the symbol_attribute to a single bitfield-- this lets us easily test for the ACCESS_UNKNOWN instead of requiring that two bits both be zero.

I've run across a case where it will be necessary to save PRIVATE symbols. Consider:

module a
private g1

interface g
module procedure g1
end interface

contains
subroutine g1(x)
logical x
print *, x
end subroutine

end module

Because g is exported with the module, g1 can be executed, even though it is not "accessible"-- "call g1" gives an error, while "call g(.TRUE.)" links and runs as expected. What will have to happen in this case is that the local name of g1 in the new namespace will be something that is illegal.

November 20

Checked in a couple of bug fixes. I've added an option from g77, -fdollar-ok, which allows dollar signs in entity names.

November 19

The g95_interface structure has been eliminated, and interfaces are now handled by linking the g95_symbol nodes together in lists. The changes were not that extensive and have been debugged. Did some work on handling the PUBLIC and PRIVATE attribute statements within modules-- these attributes alter use-associated symbols.

The larger change is that the access mode is not saved to symbols in a module-- in a situation where a module is used by another module, private symbols don't make it into the second module anyway, and their future accessibility depends on the second module.

Support was also added to allow a name defined by a MODULE PROCEDURE statement to be a function name (subroutine support is already there).

I've patched Katherine Holcomb's work on intrinsic.c so that the selected_real_kind() intrinsic always returns the default real kind for now. This will allow lots of code to be parsed without error.

November 15

Katherine Holcomb has checked in a large patch to intrinsic.c that is a first stab at implementing intrinsic functions, at least within the compiler itself. A problem with my thesis project is currently interfering with progress on g95.

November 13

Started an overhaul of how interfaces are handled. In particular, it appears that the g95_interface is not strictly necessary-- I think interfaces can be done by linking symbol nodes together without using another structure. No checkin tonight, and probably not for a few days.

November 11

Lots more bug fixes all over, with the idea of getting the problem count in the regression tests down. Added reference counting to symbol nodes.

November 10

Lots of bug fixes in diverse areas, fixing problems found in the regression tests. Once the error count is way down on these, it will be easier to spot a new problem that has been introduced inadvertantly.

November 9

Worked yesterday and today on interfaces. This is a digression from modules in the sense that interfaces and such should be fully supported before they are saved/restored. One thing that has become clear is that symbol nodes are going to have to be able to reside in more than one namespace. For example, a name associated with a module procedure has to live in the module's namespace because all of the contained program units have to be able to find it. It also has to reside in the subprogram's namespace because that name cannot be reused within the contained namespace. The same holds for contained program units within program units.

The upshot is that reference counts are going to be needed in symbol nodes so that they can be correctly freed. This is also necessary for symbols that are reference more than once through use association.

November 7

Watched the election tonight. I am something of a political junkie although I didn't have time to back anyone in particular this time around.

November 6

Applied a few more patches sent by Niels Jensen relating to matching deleted features. Started work on improving handling of interfaces-- nothing is checked in yet.

November 5

Applied patches sent by Niels Jensen that match some of the deleted statements of fortran, specifically the ASSIGN statment, the assigned GOTO and the H descriptor.

November 4

Worked on fixing some recurring problems with the format checking that one of my f77 test codes kept complaining about. The problem was one we've worked on before-- vetting formats that are strings (not in format statements). I essentially changed the code back to the way it was, which is to say reading from the string instead of the source file. Reading from the source file allowed printing an error locus, but had a couple of problems. The first was that format strings can be calculated by concatenating several strings together. In this case, our read-the-source method failed. The second was that it was complicated to convert the source file to a string, getting the escaped character right and so on. So I switched things back to they were originally. The downside is that there needs to be a better error message for such strings-- we can't use the usual error reporting mechanism to highlight the problem.

I also worked on fixing some problems with the scanner. Tobi added some code a while back that ate end-of-line comments in fixed mode. I've added some analogous code in free mode. I also strengthened the requirement for continuation lines in character contexts that the '&' be the last character on the line in this case, per the standard.

November 3

Worked on implementing the 'ambiguous' bit. After some thought, I ended up making g95_get_symbol() more elaborate, rather than trying to remember to check for ambiguity every time it is called. The difficulty was that g95_get_symbol() returns a symbol and the ambiguous bit has to be stored in the intermediate symtree. An entity (symbol) can have an ambiguous reference to it, but there could also be a perfectly clear reference to it by another name.

Since the ambiguous bit has to be stored in the symtree, the logical time to check it is when we are searching for the symbol itself. This then caused the problem that g95_get_symbol() must be able to return a failure condition-- before it always worked and returned a symbol node, even if it had to create the new node. This meant going through and changing all the places where it was referenced.

I've also added a use-associated bit in the symbol attribute structure-- there are function resolution rules and other conditions that treat use-associated variables different than other variables. In particular, the attributes of use-associated variables can't be modified after the USE...

November 2

Fixed more bugs in module reading and writing. On the first day I started g95, I downloaded a Runge-Kutta code that was about 16k lines long. About two months ago, it parsed after editing the modules out. Now it parses without any help. The only thing that is USEd is an integer parameter that determines the overall kind. But it works!

Just for fun, I checked out how some other f90 compilers do on this RK code. The module file left behind by g95 was 44k long. I figured that an ascii format would be less efficient than a binary one, but rationalized that disk was cheap. The IBM xlf90 compiler left behind a module that was 440k long. SGI f90 crashed with an internal error, as did PGI. Compaq fortran left a 160k module. To be fair, g95 is not writing everything it needs to, in particular interfaces. On the other hand, I can't see the module expanding by more than a factor of four, so g95 modules would appear to be efficient with space. And they could be made smaller without a lot of trouble.

I've also updated the binary.

I've added a link to Rob Cermak's g95 regression results page at in the "links" section. This will make it a lot easier for me and others to find the page. Rob posted an update on the mailing list that is worth repeating:

Additions:
Ben Turner has been busy sending code for inclusion into the suite.

Changes:

* Andy has been sending comments on which codes should be flagged
as 'should fail' or removed completely (flagged for a later date).

* The order in which I do the tests is important. Some files require
that some code gets built before looking for a supporting .mod file.
IE: build things in the order that the Makefile might build these
things. This will take some time to sort out.

Try and fix things like:
"Fatal Error: Can't open module file 'best_rational.mod' for
reading: No such file or directory"

* Fixed detection of core files and getting some trivial stuff out of
the debugger (gdb). Error code for test_suite set to 5 (arbitrary).
See: http://gwynedd.rutgers.edu/g95/reports/2000/11/01/g95.html

* In-progress: additional summary/naviagtion pages

* Think about a fork()/watchdog thing for getting around
the infinite parsing loops and anything else thet might
appear.

November 1

Fixed a couple of module-reading bugs. G95 can now read simple modules.

October 31

Rob Cermak let me know that Ben Turner has been the one tracking down all of the fortran 90 packages mentioned in the emails of several weeks ago. Thanks Ben!

I was reading an article yesterday that appeared in the Physical Review of 1955. The article was about a molecular binding calculation that is relevant to my thesis. To my great surprise, the author acknowledged "J. Backus" of IBM for his help in programming the IBM 701 used in the calculation. This was the time that he was actually involved in writing the first fortran compiler, although my source on the history of fortran says that it ran on the 704...

As far as g95 is concerned, the subroutines that read a module are now called by the parser and debugging has started on reading.

October 30

Tobi Schlüter sent a patch that allows the module subroutines to operate on files in directories besides the current.

I also checked out Rob Cermak's automatic regression results. He's added quite a lot of fortran 90 programs that are checked nightly.

I've added most of the rest of the code necessary to read a module. It compiles but is not tested at all yet.

October 29

I spent the first part of the day untangling the g95_array_spec structure from the symbol and component structures. Now these structures point to a g95_array_spec structure instead of containing it.

More work on modules-- checked in lots of changes. Simple modules (ie just variables) are now written correctly.

October 28

Lots of changes today. The module-writing subroutines are now called from the top level after a module has been parsed. A simple module consisting of two reals doesn't quite work yet, though. I'll try and check things in tommorrow when it has a better chance of working...

October 26

No code today, but lots of thought on how the symbol table should be structured so that it will work correctly. The file 'modules' has been checked into the doc subdirectory. Stan Whitlock of Digital, who I met at J3 mentioned at one point that they had to rewrite their implementation of modules three times. I can see why...

October 25

Started working on top-level read and write subroutines for modules. It's clear that some more thought needs to go into what has to happen.

Michael Metcalf mailed me about half a meg worth of test cases today. I've written an article for his Fortran Forum magazine about the evolution of free fortran compilers that should appear in the next issue.

October 24

Modified symbol table subroutines for handling modules. Also checked in the rather massive changes that have been made to module.c over the last week. I've also added an option for parsing the F subset of fortran 90/95.

October 20

Added reading and writing of GMP integers and floats. The code is still not checked in yet. Instead of finishing the rest of the low level stuff, I am going to finish the high-level things first (ie saving whole namespaces) so that some debugging can start to take place.

A couple of days ago, I started work on a new program that does a calculation that I've been wanting to do for a while. It's complicated enough that fortran 77 won't cut it-- I really need structures and recursion to tackle this problem, so I am currently writing my first program in fortran 90. It was either Kernighan or Ritchie that said that the best way to learn a language was to write programs in it. Writing a compiler is not a bad way either....

October 18

More on reading modules. Added reading/writing of iterators, constructors lists, various constants. Still lots of things to be done here.

October 16

Ok, Back. A number of problems were resolved with my "day job" this morning, and I can feel a g95 binge coming on. I spent some time tonight working on modules. There are sure a lot of data structures that have to be loaded and saved. Most of them are now coded, but the worst, the symbol nodes themselves are going to be last and will involve some changes in how they are stored in a red-black tree.

Nothing has been tested yet (or checked in) and this is mostly due to the fact that I/O with symbol nodes will ends up happening first when a namespace is saved. The interlinking of all these different subroutines also tends to make this an "all or nothing" sort of proposition.

October 11

Disposed of a bunch of bug reports. Fixed g95_match_init_expr() so that it would recognize constant structure and array constructors.

October 8

Tobi Schlüter added a -I option that specifies the directories that g95 should search when looking for files.

Worked more on modules, making an initial checkin. It compiles, but nothing is actually called yet.

October 7

Worked on modules today. No checkin yet. Souceforge seems to be having troubles accepting scp connections....

October 6

Tobi Schlüter fixed a bug that prevented common blocks from being present in interface blocks. It took me a while to convince myself that this is legal but as far as I can tell it is. What use is a common block inside an interface block? Tobi also fixed a typo reported by Vikram bir Singh.

After a couple days of slacking, I have started on the subroutines needed to read a module. Several subroutines that are now involved in writing the debug information will probably be moved from symbol.c to module.c so that reading subroutines can be right next to writing subroutines.

October 3

Reworked the parser's handling of end of file. Michael Richmond pointed out a problem in this area a while ago when an error was not being generated. The parsing subroutines used to return a flag that indicated whether an unexpected EOF was found by a callee, and this value had to be propagated up the stack. Now we just longjmp() out of trouble-- the unexpected EOF is a show-stopper as far as the compiler is concerned.

October 1

Rob Cermak has gotten his automatic test suite started. I checked out the first couple of problem reports. Several of the tests were not correct, but there were also a few problems found, some of which had to do with yesterday's changes to function representation.

Fixed a few problems found by Michael Richmond some time ago.

September 30

Finished and debugged changes to how functions and procedures are stored within symbol nodes. Hopefully things are a little close to "right". I am getting used to using CVS at home.

Michael Richmond sent a mail a while back reporting that a SEQUENCE statement inside of a type declaration was improperly flagged as an error. The PRIVATE property was incorrectly flagged as well, so I am thinking I just got things backwards in my mind at the time. Also greatly expanded the number of conflicts that are detected.

Rob Cermak requested that g95 return error codes for regression testing. The codes are:

0: All went well
1: Warnings issued (no errors)
2: Errors issued
3: Fatal error (file not found, ran out of memory, etc)
4: Internal error (very bad)

He also sent a patch to print the number of errors and warnings at the bottom of a g95 run. This is especially useful if the -v switch is used.

Bill Wendling suggested a random fortran program generator for testing that Mark Dewing promptly implemented in python. The program can be found here with a template file here. The basic idea is to start with a template file that generates lots of pseudorandom fortran program that can be tested overnight.

September 27

Finished the symbol documention, it is located in doc/syms in the depository. I modified the source to conform to the document, the biggest change is how functions and subroutines are represented. The code compiles, but it probably does not work.

September 25

Worked more on the symbol document, it will be ready for prime time soon. I've discovered some misconceptions that I had about symbols. For instance, if a name is a defined operator, it can be anything else at the same time... While it would have been nice to get this right at the start, one is not aware of all the issues at the start.

In other news, the FSF has received a copyright assignment from Michael Richmond... Hooray!

September 24

Michael Richmond sent a bug in regarding a problem with dummy procedures. Since this is the third or so time that I've had to try and get this right, more thought was clearly required. The problem of representing dummy procedures (vs real ones) led to the realization that the g95_symbol structure has started to get out of hand. I've therefore started working on a document that describes this rather central structure.

September 23

I've activated Sourceforge's CVS depository and imported the current sources into it. The direct link to the CVS archive is here (the main menu has also been changed). Also fixed a broken link to the J3 website.

September 22

Driving into Las Vegas at night is a treat. Phoenix is not so great-- there are too many hills in the way for you to see any of it before you are actually in it, but Las Vegas is different. Approaching from the southeast, you can see it about seventy miles out, a glow over intervening hills. You can tell it is Las Vegas by the searchlight built into the pyramidian of the Luxor Hotel. After passing over Hoover Dam, you go down and around on a twisty road, eventually coming to a pass. After coming up a small hill after the pass, you can suddenly see the whole place, all lit up, spread out over a huge valley.

Finding the hotel was no problem, since it was located close to the strip, and that was easy to spot from up on that hill. I had a little problem registering, since they managed to mangle my name in a way that I actually haven't heard before. The hotel was a nice place-- lots of room, and a free breakfast in the morning. From the closing business, I understand that J3 is going to have several more meetings there.

The meeting itself had about sixteen attendees. About a third of the members were from vendors-- Compaq, Sun, Intel, Cray, HP, and NAG. The other members were from all over-- NASA, JPL, a couple university people and other companies.

The process of creating a new language specification amounts to writing a large and complicated book. People who want a new feature, or just to clear something up propose a "paper", which is given a serial number and written up under that number. The paper goes to a subgroup that has to pass the paper by vote before it is reported back to the full committee. At the last meeting there were four subgroups-- "data", which deals with the main f2k language issues, "interop" which is currently finalizing the interoperability with the C language, "interp" (interpretation) which interprets the holes in fortran 90/95 and tries to clear up f2k as appropriate.

Once a paper has been passed by a committee, the author finalizes any edits and puts printouts on the table at the end of the day for people to read for a vote the next morning. It's a grueling schedule. Meeting and talking all day, reading and writing half the night. I heard several complaints from people who were unable to satisy their urges for compulsive drinking and gambling.

Most of the committee had already heard of the g95 project and I let them know where we stand at the moment. Several of them wished us luck. I talked little about how g95 works internally. The guy from Sun asked me how many compilers I'd written before. I had to say "Uhhh, none, this is my first one".

I've also absorbed enough fortran 95 to be able to understand most of what was going on and even contribute in small ways and even participate in a few "straw votes", which are nonbinding votes used to get a sense of where people stand on issues-- they are the main reason that most things pass by unanimous consent.

The discussions were very amicable. Unlike a lot of other gatherings, the people there were willing to be persuaded. It was explained to me that corporate representatives are occasionally required by their companies to vote a particular way, but my impression is that the companies mainly want a representative just so they know where the leading edge of fortran is, as opposed to defining it. As far as I could tell, everyone was there because they wanted to be and were all interested in making sure that f2k is as good as it can be.

As far as technical issues go, f2k is becoming a huge language. People were saying that f2k is to f90/f95 as f90 was to f77. New thing include polymorphism, constructor and destructor functions, user definable data transfer functions that are called during I/O depending on what is being output, stream I/O (no records) and interoperability with C. I'm probably missing things that weren't discussed at that particular meeting. But there is a lot of stuff here.

Another thing I got out of the conference was a better understanding of the idiosyncratic nature of fortran. For example, in C, you can do something unexpected like:

*strchr(string, "x") = '\0';

because strchr() returns a perfectly good character pointer, which can be dereferenced by the '*'. In fortran, the expression

(/ 1, 2, 3, 4 /)  

is an array constant, but you can't write something like

do i=1, 4
print *, (/ 1, 2, 3, 4 /)(i)
enddo

because subscripts are only allowed after a named variable instead of any old array expression. And there are a lot of similarly weird restrictions. What has happened is that the language has just had a lot of things stuck on it gradually, unlike the evolution of C before it became standardized.

I had a really good time and will head back when I can.

September 19 3:00pm MST

Michael Richmond pointed out some debug code left in the format parser, which has been removed.

There will be no updates until the weekend. The car is gassed up, I'm packed and heading off to the Fortran J3 meeting in Las Vegas. I'll give a report of some sort when I return.

September 17

The long standing problem of correctly pointing to errors in long format strings has been fixed. Format string, whether they appear in FORMAT statements or string constants, are now read directly from the source file. This allows an accurate error locus to be used.

September 16

Fixed the last of the problems that Michael Richmond reported a while ago. Also overhauled the ENTRY matcher, changing how an ENTRY is stored. Michael also reported a problem that dealt with alternate return labels in a CALL statement were not causing a target label to be marked as "referenced". This has been fixed.

Dan Nicolaescu reported a core dump that was caused by copied code that converted a real to an integer.

Updated the link to the X3J3 committee's website. They're at www.j3-fortran.org now.

September 14

Michael Richmond sent in fix for the crash reported the other day by Ian Watson. The problem was with:

SUBROUTINE FOO(I)  

The parser complained about a premature end of file, and tried to use the %C code, which signals the error print subroutine to insert the current locus. The problem was that the current locus wasn't pointing anywhere. Now we just print the filename of the offending file.

September 12

Finished separating the matching of array references and comparison of the reference to the specification, started debugging it a bit. I'll try it out on the code that actually caused the separation tomorrow.

September 11

Boy, two-year-olds can sure be a handful...

Michael Richmond sent in an email a week and a half ago detailing some problems. The first dealt with how a name that was marked as EXTERNAL was interpreted. We actually have to check the next character following the name to see if it is a function call or a procedure variable.

The other problem he pointed out is causing a lot more work. I had thought that an array reference could only follow an array specification. This isn't quite true. The counterexample is:

EQUIVALENCE (NX(1), X)
INTEGER NX(1)

This means that matching an array reference has to be decoupled from comparing an array reference and an array specification...

September 7

Unfortunately, nothing on g95 for the next couple of days. I am going to visit my nieces and nephew this weekend.

September 6

Started working on raising numbers to integer powers. This required a generalization of g95_int_expr() to create expression nodes of any numeric type. The new function is g95_constant_expr() and replaces g95_int_expr().

September 4

Tobi Schlüter sent a patch that fixed an incorrect error when making sure the ADVANCE tag wasn't mixed with list formatted io.

September 3

Tobi Schlüter sent a patch that fixed the KIND of a complex number when one of the components was a literal that looked like an integer. Normally I deal with mail in more or less chronological order, but patches get priority.

Bill Wendling sent a patch that fixed a fall-through in the switch() that controlled resolution-- the case for DO resolution was falling through to the ALLOCATE resolution.

Dan Nicolaescu pointed out a few days ago that trying to print a variable of type COMPLEX failed with an internal error. I've fixed this, but the corresponding fix for derived types is going to have to wait until we get into code generation.

Jos Bergervoet sent in a problem with the RESULT keyword in function declarations. The return code of the subroutine that actually set the 'result' attribute was incorrectly checked. He also pointed out that array constructors weren't being parsed correctly. This was due to setting the wrong expression type and also uncovered the problem of not freeing the constructor itself. All fixed.

I've also taken the time to split expr.c, which was becoming quite large into two smaller pieces. The first piece is a new file, primary.c, which takes care of matching primary expressions like integers, reals, complex, logicals, array constructors, etc. The second piece consists of the remaining functions in expr.c, which handle lots of non-matching expression things like allocating, freeing copying, resolving, simplifying and converting.

The other task I started on was a cleanup of the prototypes in g95.h. It's been a long time since I've done this and a lot of the prototypes were out of order, in other source files or just plain no longer used. I only got through symbol.c, and there are still probably 2/3 of the source left.

In short, lots of changes all over today.

August 31

Michael Richmond wrote in three days ago to point out some problems, some of which have already been noticed and fixed. From those that weren't he noted that g95 warned if (insignificant) spaces were being truncated when a line is read. I've changed this to warn only if nonspace characters are seen. He also pointed out that ENTRY names within a FUNCTION are not given the "variable" attribute that they need (and I'd guess that the RESULT keyword parsing probably isn't there either). It's also obvious that more work is going to be needed on the ENTRY statement.

August 30

Fixed the (a.or.b.and.c) problem first reported by Dan Nicolaescu... It took about ten minutes of staring alternately at the code and the standard, and the fix involved replacing an "and" with an "or".

The last problem had to do with a code fragment that looked like

   DO 20 J=LL,L4
LB=J
20 IF (X(LB).LT.0) GO TO 310

As far as I can tell from the standard, the GOTO statement is not allowed to be a part of the statement that terminates the nonblock DO-loop. If someone knows otherwise, let me know.

I also had a chance to look at Katherine Holcomb's progress on adding to intrinsic.c. She's got range-checking done for a few of the intrinsics, and is also working on type conversion intrinsics like CMPLX. It looks good so far and she had a couple questions that I answered (probably should have cc'ed it to the mail list).

August 29

Jos Bergervoet reported a successful compile of g95 under Solaris (64 bits!) and HP-UX and pointed out that I forgot to mention which version of gmp is/will be needed.

Marc Dejardin pointed out a problem with my INCLUDE fix the other day-- I included the case preservation to the keyword itself. Internally, the keyword is stored in lower case, so an upper case INCLUDE line was not found....

Dan Nicolaescu sent two more bugs, one of which I fixed. The other problem has to do with parsing a logical expression of the form

if (a.or.b.and.c) stop  

I didn't have time to fix this tonight. I am currently about three days behind on g95-related mail. Katherine Holcomb sent an update to intrinsic.c that I'll look at next.

August 28

Alaeddin Aydiner wrote in pointing out some problems-- g95 didn't correctly back up when a match with an I/O list failed to be matched, and there was a problem with array matching (and, as it turned out, printing).

Martien Hulsen found a problem with the INCLUDE statement-- it was folding case, thereby mangling the filename into what could easily be a different filename. This problem is also present in MODULE names and the USE statement-- some special handling is needed for these "symbols". He also pointed out that the matcher for iterators would not accept a space after the start expression-- That problem has been in there for a while. I've made the iterator matching much more robust.

Dan Nicolaescu reported another subtle problem:

REAL*8 Z(M)
INTEGER M

Complained about duplicate application of the integer attribute. The problem was that g95 was assuming that untyped variables could be given their default type when first seen. This usually works, but the above program provides a counterexample. Dan has actually being parsing spec95 with g95. He reports that we have under a dozen distinct errors left.

August 27

Dan Nicolaescu wrote in pointing out a problem with my fix-- the complex constants were still being matched too agressively. I added code to the error handler to allow pushing and popping of error messages so that a matching subroutine can generate its own errors and still decide to return MATCH_NO.

August 26

Bill Wendling sent a patch that adds parsing and printing of substring constants and cleaned up a few comments. Thanks Bill!

Laurent Klinger reported three bugs-- symbolic constants inside of complex constants were not handled correctly and namelist variables on the right hand side of a NML= tag were not recognized for what they were, both of which were fixed. The other problem he found was that symbols are not being created in the right namespaces... this is a big problem which I am going to wait on.

Michael Richmond reported a couple problems that have been fixed: Tabs were not always being expanded in the statement label region for fixed-form source, the blanket form of the SAVE statement was not implemented at all. He also was the first to point out a problem with the parsing of the data-transfer statements:

READ(10,100) X  

The problem is that (10,100) is a perfectly good complex constant, and was being parsed as a unit number in the form of the READ statement without the I/O control list. This worked fine when I originally implemented the I/O matchers, because matching complex constants was not implemented...:). The data transfer statements are now handled the same way as the other I/O statements-- first we check for a '(' and a subsequent control list. If the '(' is not found, then the alternate form is checked for.

Michael also pointed out that a substring reference of an otherwise unknown symbol ran into problems because g95 decided too quickly that the substring reference was really a function reference. This is now fixed, but I suspect the algorithm for deciding what an otherwise unknown symbol is may have to undergo more revision. Also fixed a problem he pointed out regarding FORMAT and ENTRY statements that preceded the specification statements. The last problem was requiring another edit descriptor after the 'P' descriptor. I've also added a -ffixed-line-length-80 option which duplicates the functionality of the same g77 option at his request.

Added resolution functions for subtype references and array references. Added several constraints relating to the data-transfer statements. Added an optional warning that lets the user know that a source line has been truncated due to being too long.

Dan Nicolaescu found a problem with the error printing in the format-checker, which is now fixed. He also uncovered the fact that g95 doesn't regard a constant raised to an integer power as an initialization expression. Dan also sent a problem regarding a single-statement IF-clause. The clause in question had a PAUSE statement as its action clause, which has been removed in fortran 95. The real problem was a horribly misleading error message that has been fixed.

Marc Dejardin pointed out a bug in how g95 handles comments. Fixed g95_match_eos() so that it eats any trailing comment in the current line that it is parsing. The comments are slightly tricky-- in free mode, a 'c' at the start of the line does *not* start a comment.

G95 is 21,000 lines long.

August 23

Laurent Klinger, Dan Nicolaescu and Michael Richmond were kind enough to send in a couple of bug reports. Laurent reported a successful compile of g95 on a Sun Ultra3 running Solaris 2.6. I fixed a problem noted by Dan having to do with the IMPLICIT statement. He and Michael both noted problems with the READ statement-- I looked at the code, and somehow I deleted the part that looks for a unit number, no doubt a couple days ago when I was messing with the ordering of parsing IO tags.

I haven't had a chance to deal with all these reports yet, but I will get to them.

August 22

Toon Moene's copyright assignment has been received by the FSF. Hooray!

August 21

Bunch of web stuff tonight. The links on the source pages to the individual files appear to have been broken but are now fixed. I have uploaded a Linux x86 binary linked against glibc2, so that non-hackers can beat on g95.

August 19

Niels Jensen let me know (again) that I/O statements were leaving some unwanted symbols in the symbol table. This happened when g95 tried to match a unit number when there was none. For example:

OPEN(FORM="formatted", UNIT=6, FILE='/dev/null')  

matched a variable reference named 'FORM', then bombed at the equals sign. The expression was freed, but there was still a symbol named FORM in the symbol table. I've fixed this by changing the ordering of the matching so that tags are matched first. These only leave symbols if the '=' sign is seen, so no unwanted symbols are created. This has actually made things cleaner.

Added parsing of the IOLENGTH form of the INQUIRE statement. This will end up generating calls to the I/O library which will count the number of characters generated and throw the output away.

Fixed the problem with a function interface not creating a function reference in the parent namespace. The problem was that inside a function without a RESULT being specified, the function name itself is a variable. This was being set in the parent namespace when an interface was being compiled.

Fixed the error recovery subroutine within the scanner so that it would eat the rest of a continuation line as well as the current line. If an error is now generated within a line, the next line won't generate an "unclassifiable statement" error.

Added a new basic type, BT_PROCEDURE. This is necessary when passing procedures as actual arguments and also procedure assignment statements.

Added a copy_ref() subroutine that recursively copies lists of g95_ref structures. This was needed to implement copying of expression nodes that represent variables. Also implemented a similar function to copy the constructor structures.

The 16K line Runge-Kutta code I've been mentioning for about the last week is now fully parsed by g95, though to be fair, the code consists of copies of the same code in one, two and three dimensions and I've also edited it a bit so that modules are not needed. My Pentium-120 parses all 16K lines in about 13 seconds, giving about 1200 lines/second on an underpowered machine by modern standards.

August 18

Niels Jensen pointed out (two days ago) that labels associated with DO-loops and the ERR, EOF and EOR associated with various I/O statements generate errors about "labels used but never referenced". I fixed that, but then got interrupted by a storm.

Added a fix for the interfaces that involves copying symbols from one namespace to another, but it didn't appear to work.

August 16

Finished debugging the resolution subroutines for array and structure constructors. Found and fixed a few small problems with compiling the Runga-Kutta code. The big problem has to do with interfaces-- the name of a function or subroutine has to go in the interface's parent namespace, not the namespace of the interface.

Also added __DATE__ and __TIME__ macros to the status message. These will come in useful later.

August 15

Not much time for g95 tonight. I've updated the BUGS file to reflect things that have been taken care of since I last looked at it. It looks like the only major thing left as far as parsing is concerned is reading and writing module information. Given the ease of writing subroutines to read and write lisp-style lists, it shouldn't be that hard. A major thing that does have to happen is how the symbol table is organized-- after a USE, you can have two or more names that reference the same entity...

August 14

Debugged the structure constructor matchers, started debugging the resolution subroutines for the constructors, both arrays and structures.

August 13

Added parsing of structure constructors. Not tested at all yet.

August 11

There have been some emails flying back and forth regarding proposals for how g95 should do floating point arithmetic that haven't filtered up here. The upshot is that a new type has recently been donated to GMP, the mpfr_t. This type is mpf (floating point) and the 'r' is for 'rounded'.

For each mpfr_t, the number of bits of precision can be set. If we know what the target machine is, then this determines how many bits of precision each kind has. It also means that we portably emulate arithmetic on the target machine using GMP.

Kate Hedström has compiled the latest GMP with g95 and I have repeated her success. I have updated the notes on compiling g95 on the source page. The things left to do here are replacing the mpf_t's with mpfr_t's, worrying about how many bits are associated with each kind and (later) emulating infinities.

Katherine Holcomb also sent mail asking some questions about how intrinsic.c works.

August 10

Finished debugging the parsing of array constructors.

August 8

Earlier this year, a list of the "The Top 10 Algorithms" was released and caused quite a stir on comp.lang.fortran as well as other places. I was browsing the list in the January issue of "Computing in Science and Engineering" which had an article on each algorithm, one of which was the Fortran I compiler. Some interesting facts about the compiler was that it took 18 man-years to write and was 23,500 assembly statements long and performed many optimizations considered to be sophisticated by today's standards. From the article, it also appears that g95 parses fortran programs much like Fortran I did.

The full article can be found here. The bad news is that you have to connect from someplace that subscribes to CSE or you'll be asked for your credit card number.

Unfortunately, no time for g95 today.

August 6

It became obvious that the current plan for storing expression nodes wasn't going to work. In particular it wouldn't have been able to represent an array of structures, so I've come up with something else. It's not that different, but it looks like representing arrays is going to be challenging however it is implemented. The code doesn't compile at the moment because I'm still switching parts of it.

I've halfway debugged the parsing of array constructors. I needed to modify the expression matcher in the process-- Something like:

(/ 2 /)

was being parsed as a "2", followed by an "/", which then complained about the missing denominator. The solution was of course not to complain about a missing denominator and restore the parse pointer to the '/' character. If a denominator is really left out of an expression, then the parser will generate an error when the '/' is reread.

August 4

Adding the array constructors also requires changes to the expression node in order to store the new data. I've started on that. The same structures will also be used to store structure constructors. Fixed the messed up dates for the last few days...

August 3

Added parsing of array constructors. It isn't called yet, but it compiles and sort of looks like it works. The code works almost identically to the subroutines that match similar constructions for IO-loops.

August 2

Ok, back. I've been really busy lately-- not a lot of time tonight, but I did fix a few bugs that showed up in the RK code. I removed the symbol_attribute field from the expression node that I mentioned the other day. It turns out that this really needs to be calculated. In an expression like A(1) the subscript actually causes the dimension attribute to be removed from the overall attribute.

July 30

My RK code showed a deficiency in the how a variable expression is stored-- something like "VAR%NAME" does not have the attributes of "VAR", but rather of its member "NAME". I fixed it by creating another member in the expression node, but I think I'm just going to change it to a subroutine call, calculated when needed.

Tobi sent a bunch of small patches related to the intrinsic functions that he has been working on lately. This area needs a little work too.

July 29

Lots of stuff today. The biggest changes were in the parsing module. The checking for proper statement ordering is now in a single place and this made several of the program unit parsers much smaller.

I implemented the last statement matcher, for the USE statement. After that I started running g95 on a fortran 90 Runge-Kutta integrator that is about 16k lines long. This pointed out a lot of previously undiscovered problems. I can get to about line 500, after working around the fact that the USE statement doesn't actually do anything.

I also applied a small patch sent by Tobi, and will probably start on Kate's floatlib patch soon.

July 27

No one commented on the plan for gmp/floatlib, so it must have been a pretty good idea.

Finished debugged the subroutine for parsing a contained subprogram. Added a parser for the MODULE statment, which is not debugged yet. I think this may be the last statement matcher that has to be written. I'll start posting binary releases that anyone can test as soon as we have all statements (but not every fortran 95 constuct) being matched.

I think the thing I will do after this is finished is to better document g95's internals so that it will be easier for others to contribute, both from a matching and code generation standpoint.

July 26

Decided what to do about the current GMP/floatlib dilemma: We'll use floatlib for real numbers (the newer version does more than I thought) and keep GMP for integers.

Applied a patch that Tobi sent in a few days ago that I didn't understand real well, and also moved the sort_actual function within intrisic.c to make argument checking easier.

Added code for parsing contained subroutines. It is now called, but never tested.

July 25

I didn't have much time to worry about the GMP/floatlib issue today. I only made a small addition to parse.c, two functions needed to parse contained program units.

July 23

Finished debugging the parsing of the WHERE statement, added parsing of the FORALL statement. G95 is now 20,000 lines long.

Kate Hedström sent in a patch that starts to get rid of the GMP library. I've been thinking about this issue of doing target arithmetic for a couple days now and am still unsure of which way to go. On one hand, GMP is easy and portable. On the other hand, this sort of target arithmetic is closer to how the target actually does its math.

I'll post something to the mail list soon.

On a positive, if only slightly related note, I've gotten the XFree86 4.0 Direct Rendering to talk to my Voodoo 3 video card. It speeds up the GL 'gears' demo by a factor of three. Even better is that it also appears to work fine with IBM's Open DX visualization package. For scientific computing, there isn't too much that isn't freely available for PCs these days.

July 22

Tobi Schlüter sent in a patch to add a kind for quad precision reals needed by Bertrand Joël.

I've started work on parsing WHERE blocks.

July 20

Bertrand Joël wrote in with a problem of IMPLICIT statements not being recognized with interfaces. Tobi Schlüter sent in a patch that added IMPLICIT statements to interfaces. After applying it, I added some more code to require that the implicit statements come before all of the other statements. I also copied this code to the BLOCK DATA parsing subroutine that I added yesterday.

Kate Hedström wrote in with a few miscellaneous problems, some culled from her previous work on the g77 torture test suite. These included the program consisting of a single END statement, an illegal expression that had a bad error message and a problem with the substring matcher.

July 19

The problems mentioned yesterday appear to be solved. Worked a bit today on parsing BLOCK DATA program units. This involved adding a new parsing subroutine to parse.c, and some logic to check_conflict to complain about illegal attributes found in these program units. Haven't had a chance to test this stuff yet. Got some email from the FSF today, they've received copyright assignment forms from:

* Joseph Cermak
* Katherine Hedstrom
* William C. Wendling Jr.

Welcome aboard!

July 18

I think I've got things back under control with respect to the line wrapping. My thesis project is parsed without any problems and as far as I can tell everything works again. I got a start on fixing the problems with the FORMAT statement, but ran out of time today.

July 17

Ok, I remembered why I didn't advance lines automatically. The problem is that if a line has something (say, an expression) that ends prematurely, an error is generated that points to the line following the error. I've put a comment at a strategic place to prevent someone else from going down this road again.

This sort of seesaw is a bad sign-- It means I haven't thought things out enought before proceeding. Unfortunately, screwed up error loci outweigh masked parser problems, so it is back the other way again. The code currently compiles, but it generates strange errors in strange places.

Wrote Bertrand Joël concerning a problem about a failure to match the end of statement after a FORMAT statement. The fix will have to wait.

July 16

Catching up on emails today. Tobi sent in a problem that had to do with random kind data being present in derived type typespecs. I think I fixed the problem, but I don't have his test case.

Tobi also sent a patch that set the expression locus in g95_match_rvalue. He also found a bug that prevented a CALL to a subroutine without any arguments from being correctly recognized. The reason had to do with matching an end-of-statement twice. One of g95's matching conventions is that when you match something, the scanner points just past the matched thing.

The reason EOS was matchable twice was that g95's scanner doesn't move to the next line without an explicit call, done between full statement matches.

Originally, this was done to make things clearer, but now it is obvious that this leads to situations where it masks parsing bugs in higher level subroutines. I spent the rest of today's work on g95 removing this misfeature.

Now, the scanner returns a '\n' when it hits the end of a line (newline are not actually stored) and moves to the next line automatically. As before, end of file returns an infinite stream of newlines.

This gave me the opportunity to add a single subroutine responsible for skipping comments. This will be useful if we ever need to implement directives in the form of special comments. The g95_match_eos() subroutine was upgraded to match multiple semicolons if present and several bugs involved in skipping comments were removed.

I'm pretty sure that g95_next_char and below are working correctly-- I have only to check that include functionality is still working as well as the things below g95_next_statement.

July 13

Fixed the problem reported by Bertrand Joël yesterday. This involved a total rewrite of match_attr_spec() for the sake of the CHARACTER type. The problem was that until we see the double colon, we can't be sure we're looking at an attribute specification if the type is a character. The standard allows something like:

CHARACTER*(*), save, target, parameter  

Which defines three variables named save, target and parameter! The analogous code with any other type

INTEGER, save, target, parameter  

is illegal. I had originally assumed that seeing a specification keyword (which starts with a comma) guaranteed that we were seeing an attribute specification, but this turns out to not be the case.

What the new match_attr_spec has to do is to read each of the attribute keywords, storing them until we get a double colon. At this point, we can start making sure the attributes make sense. Before this we can only return MATCH_NO.

July 12

Bertrand Joël sent in a bug regarding a bug in the parser regarding character type declarations. The confusion resulted from the mixing of the old and new style of declarations. No time to fix it tonight.

Tobi Schlüter sent a small patch that add locus information to the expression node generated by matching a variable.

July 11

No update last night, my monitor had some problems with a lot of internal electrical arcing. I went shopping and now have a new, much better monitor. It is amazing how computer prices drop-- I paid about half for this new 17" than I paid for my old 15" several years ago.

And one of the best parts?? That "new computer" smell!

Anyway, I worked a little bit on complex arithmetic. We can now add, subtract, multiply, divide and compare complex numbers. I am thinking the way to resolve the substring dilemma is to go the function route. It preserves the constantcy of EXPR_CONSTANT and we can still do everything that has to be done.

G95 is now 19,000 lines long.

July 9

Worked a bit on parsing substring references. There is a bunch of code, but it is not called, because I have hit something of a dilemma. The problem has to do with how to store the case of a substring reference that appears after a constant string. Normally, references to subobject, like structure references, array references and substring references are stored in a singly linked list of g95_ref structures that are attached to an expression node that points to the parent symbol.

The "obvious" way to store a substring of a string constant is to just add the reference structure (which holds the start and end indexes) to the expression node structure that represents the constant string. The trouble is, the node has a type of EXPR_CONSTANT, which is no longer really a constant, since the range can be composed of other variables. It isn't a variable either, since it cannot be assigned to.

In my experience, changing the meaning of flags in subtle ways like this is a bad idea-- it can very easily break assumptions that one has long since forgotten about. The other, equally disgusting ways of doing this are: making the type of the node EXPR_OP, and defining a new intrinsic "substring" operator, or perhaps the best way is to make the expression node an EXPR_FUNCTION node, with a pre-resolved function that "does" substrings. In any case, the best thing to do at the moment is sleep on it and make a decision later.

The analogous case with arrays

do i=1, 5
print *, (/ 5, 4, 3, 2, 1 /)(i)
enddo

is specifically prohibited by the standard for some reason.

Fiddled with some intrinsic function resolution issues and realized that I was over my head there too-- I think one of the biggest mistakes that a lot of people make in programming is just jumping in and writing code without a lot thought about where they are going.

July 8

Potpourri today. Fixed several resolution issues, both in and out of intrinsic.c. Added checking for the square root intrinsic-- basically just comparing against zero. Fixed the checking for statement labels-- this makes sure that a label is being used consistently as well as being defined and referenced. Added the matching for binary, octal and hexadecimal constants into the matching of constants, not just DATA statements.

July 6

Finished the first half of the intrinsic conversion stuff. An intrinsic conversion is now converted to a function call to a special intrinsic that does the conversion at run time or converts a constant at compile time. The REAL, INT and CMPLX "functions" cause one of these functions to be generated depending on the type of it's argument. The scheme is also meant to allow extension of the basic types into kinds.

Tobi Schlüter sent a patch that matches binary, octal and hexadecimal constants. Normally, these are only allowed in DATA statements, but it might be nice to allow them anywhere an integer constant is allowed.

The code compiles again.

July 5

More stuff on intrinsic type conversions, mostly in intrinsic.c. The code still does not compile or work.

July 4

Did a little more work on the intrinsic type conversion, added limit checking to the actual conversion functions added the other day. Also got rid of some of the placeholder stuff that was there before. The code does not work at the moment.

Once completed, the REAL, INT, and CMPLX handlers should provide Katherine with some good examples of complicated intrinsic handlers-- REAL(1) is different than REAL((1,0)).

A couple of days ago, Tobi Schlüter sent a patch that held locus information in a variable named where. I've changed the name of the expr_locus member of the expression nodes (and all references) to where because it is such a better name.

July 2

I did a survey yesterday of what remains to be done. The results are in the BUGS file. Niels Jensen pointed out that substring parsing was not there.

Tobi Schlüter sent a bug fix that adds locus information to the new expression parser. This was causing error messages to crash.

I've added a couple of subroutine to arith.c that convert between constants of various types. There is no range checking at the moment and they are uncalled, but they will form the basis for simplifying REAL, INT and CMPLX intrinsics.

July 1

Niels Jensen send a bug that had to do with a complex constant causing a crash. Fixed it.

Worked today on how intrinsic functions are resolved and simplified. The ABS function now works in the sense that it is recognized to be intrinsic, selects the correct function by the type of its argument. It even simplifies constant arguments. The plan now is to get the REAL, DBLE and CMPLX functions going and replace the existing _convert placeholder function.

June 29

No code tonight, but lots of thought, mostly about how the details of how the resolution phase has to work and how simplification works its way into this process.

My fundamental realization was that the resolution phase rewrites expression nodes. Depending on its argument list a function can reference a wide variety of things, even within the same function. The start of the resolution phase figures out if the name is generic, specific or neither. This part doesn't depend on the argument list, and so this information can be stored with the symbol.

Depending on the status of the symbol, one of three procedures is followed to determine what a function actually refers to. The function reference is represented as an expression node with an expr_type of EXPR_FUNCTION. The 'symbol' member points to the symbol being called. By examining the argument list, we "retarget" the function to call to point to another name-- an external function, an intrinsic function or whatever.

By a "name", I mean a real symbol name, one that makes it to the assembler. For an intrinsic function, the name will contain characters that make it uncallable by any other method other than by g95 determining that it is a reference to an intrinsic subroutine.

Simplification of a function call can only happen for intrinsics and can happen as soon as a function is determined to actually reference that intrinsic. Simplification also has to be callable separately for initialization expressions, because these functions have to be simplified before the resolution phase. In this case, there is no resolution and function calls must refer to an intrinsic.

There is one weird case I have to investigate before actually moving ahead-- the standard points out that name in a module can be renamed by a USE statement, and the reference to this new name still refers to the old intrinsic...

June 28

Added a patch sent by Niels Jensen a couple of days ago-- this one allows warnings to be deferred until the statement is accepted. This eliminates the problem of issuing a warning in a statement matcher only to have the statement fail to match later. This situation results in a bad warning message. This patch stores the warning message in the same manner as an error message.

Started working on some examples of simplification subroutines for Katherine. The first one is just ABS().

G95 is now 18,000 lines long.

June 27

Yesterdays update was a little late-- Sourceforge had some trouble. The internal motd said something about having problems with IRC people....

Not a lot of time for g95 today. I moved the arithmetic conversion stuff out of expr.c into arith.c. This has two purposes-- it gets rid of calls to the GMP library from outside arith.c which clears the way toward replacing GMP. The second purpose is to have the compiler do more arithmetic on its own, in particular being able to evaluate those intrinsic functions that it has to be able to evaluate.

For example, if we want to evaluate the ABS() intrinsic, we have to be able to compare a value with zero and negate it if necessary. The comparision and negation are already there, but we have to have some way of allocating an expression node which has a zero value for the comparison to be possible.

June 26

Niels Jensen sent in some minor patches that fixed typos and such. I accepted a revision of intrinsic.c that contains a lot of cosmetic cleanups and we are working on a new scheme for delaying g95_warning() messages from being displayed until the line to which the warning applies has been accepted. This prevents a wrong message from showing up if the statement is rejected.

The FSF has gotten in touch with me regarding an account that I will be able to use to check the status of copyright assignments. Of course, they use Kerberos, which means I have to install yet another software package...

June 25

Major rewrite on the expression matcher. This is actually one of the earliest parts of g95 that was written. After the mechanics of scanning was complete, it seemed like a lot of things depended on matching expressions. After all, FORTRAN is FORmula TRANslation.

This matcher was a simple infix parser that took a stream of tokens provided by a lexical analyzer and built them into a tree of expression nodes. The fundamental problem was that I was never quite sure that this parser followed all of the rules dictated by the standard. For example, something like: A+-B is not allowed by the rules, but A.EQ.-B is.

Anyway, the patches to fix the parser kept piling up and it wasn't clear what problems were still there and how to find them.

The new expression matcher works the same way as other matchers within g95-- we try to match something and if that doesn't work, we try to match something else. This also lets us implement the rules for matching expressions as they are in the standard and gives me some confidence that we are doing things right, without hidden surprises.

The downside of this method (in general) is that it is slower than the stream of tokens approach. This is because the token approach can back up sooner when a particular syntactic state is determined to be wrong. The matching approach of g95 eats larger "tokens" and consequently will need more backing up.

When running it on my thesis project, I get the feeling that it is slower than it was before, but it is still acceptable. It is faster than g77 (which is doing a lot more than just parsing fortran), but as long as the compile times are on the same order of magnitude, we're fine.

I've applied patches sent by Tobi and Niels that fix a lot of small things. My mailbox is now down to less than a dozed letters for the first time in a long time. My next priority is to get Katherine going again, no matter how many patches I get peppered with...

June 22

Added a patch sent by Niels Jensen that added the $-format descriptor to the FORMAT checking code. We also changed the DO matching so that it avoids leaving symbol table modifications laying around, just like the IF statement a couple of days ago. The DO WHILE statement now generates a regular EXEC_DO node and a new EXEC_DO_WHILE. This new node type made more sense than dealing with the overloaded meanings of the g95_iterator structure.

Tobi and Niels still have several emails pending...

June 21

Applied a patch sent by Niels Jensen that cleaned up the source, including adding a display_help() function that looks sort of like g77.

Niels also sent a patch to correct a problem matching a tags associated with variables. This revealed a deeper problem in that we really need to be matching a very restricted form of expressions and not symbols. This caused lots of changes in io.c.

He also found a problem with matching real literal constants that I introduced last night. It is fixed now, along with the corresponding bug in the subroutine matching complex numbers.

June 20

Finished adding a patch sent by Tobi Schlüter for matching complex constants. I also made the subroutines for matching a component of a complex constant similar to that of matching a single real constant. In the process, I found a bug that dropped the last digit of these constants.

Added Tobi's preliminary patch to match pointer assignments. I haven't had a chance to test it yet.

Applied a patch sent by Niels Jensen that correctly matches an implied unit expression in a format specifier.

Claus Fischer wrote in to point out that the ENTRY statement wasn't being recognized as an executable statement. This has been fixed.

June 19

Now that we're adding more and more command-line options, it seemed like it was time to figure out a consistent way to handle how options are set and stored. To this end, I've added a typedef-ed structure called g95_option whose members contain all the options from the command line. The current options are there now.

Niels Jensen sent a patch to add a -pedantic option, and I think this is something we definitely want to have, but there are some problems with just calling g95_warning()-- such messages appear immediately when they should not be displayed. More work is necessary here.

Fixed a bug I added yesterday when checking for old-style size specifiers. Both Niels Jensen and Tobi Schlüter sent patches for the fix.

Claus Fisher and Niels Jensen identified a bug associated with parsing a FORMAT statement. Niels tracked the bug down to g95_gobble_whitespace() and stomped it.

Niels also sent a fix for a bug in the matching of a BLOCK DATA statement that I am shocked that I missed as well as fixes to the OPEN and INQUIRE statements. He also sent a patch to allow a dollar sign in FORMAT statements.

Tobi Schlüter has two patches pending which are a little larger and more involved-- matching complex constants and pointer assignments. I will get to these soon.

My mailbox is now down to a managable number of letters. Hopefully this situation will remain this way for a while...

June 18

Things are improving. I recently bought a DSL line and trying to get it working made it really obvious how out of date my system was. I was running x86 Linux 2.0.27 based on a version of Slackware that was at least six years old. After many hours of trying to make things work piecemeal, I bought RedHat 6.2 for $16, backed up everything on tape, installed 6.2 and pulled selected bits off of the tape. Right now everything works pretty much as it did before, except that I have a usable system and a fast, live connection to the net. Life is good.

I finished applying Steven Johnson's patch with one little exception that had to be made for a user operator named '.e.' or '.d.' and improved error diagnostics on expressions. I've also modified g95_match_interface() not to accept defined operator names with non-alphabetic characters. A long time ago, I can remember being quite puzzled as to why the standard specified this...

The work on this problem led to the discovery of a serious problem in the IF statement, which led to the same serious problem in the decode_statement() subroutine. The problem was failing to abide by the convention that all matchers are allowed to mess with the symbol table, with the understanding that on MATCH_NO or MATCH_ERROR any changes would be undone by the caller. This wasn't happening all the time in decode_statement().

This convention has serious consequences for the matching of a simple IF statement, that is, an IF-clause followed by a single executable statement. The single executable statements have their own matchers that we need to call. While we can distinguish between which matcher to call by looking at the next keyword (doesn't affects the symbol table), the problem is that an assignment statement can start with the same keywords. The problem that was happening was that the symbol table had been modified by the IF's control expression and these changes were being undone in the process of successively matching the different action statements.

The much more careful algorithm first matches the IF-expression, then tries to see what comes next. If it is an arithmetic IF (Steven Johnson noted this was missing a while back) or an IF-THEN, these are matched straight through, because doing so does not affect the symbol table. We then try to match an assignment statement. If this doesn't work, we undo symbols (getting rid of things build up by the assignment matching, as well as the control expression. We then re-match the IF-expression part (which is guaranteed to succeed, since it worked before), then we peek at the next keyword and call the appropriate statement matcher to see if the rest is correct. More involved, but it works correctly.

Applied a patch sent by Steven Johnson that set the length of a CHARACTER declaration to one when nothing else was present

Fixed a bug sent by Niels Jensen in which the character-constant matcher generated an error too quickly instead of a MATCH_NO.

Fixed a pair of bugs associated with another problem Niels sent. The array-reference matching didn't work right on a whole array reference. The other problem was that the literal constant matching subroutines couldn't match signed quantities. The matchers were written this way on purpose because '-' is also an operator that has to be matched separately. The numeric constant matching subroutines now take a flag that indicates whether to match a sign or not.

If someone wants to write the matcher for binary, octal and hexadecimal integers, please go ahead.

Fixed a problem with matching literal constants sent by Tobi Schlüter. The matcher has to try and match a character constant before an integer constant because the kind parameter for a character constant is in front and it is a valid integer expression.

Applied a patch sent by Niels Jensen that fixed some problems with the kind-number assignments. In g95, the kind numbers for complex numbers are the same of those for real numbers. Another unstated rule is that for numeric kinds, the kind values are sorted by precision, so that if k1>k2, then k1 has more precision.

Fixed a problem reported by Niels Jensen having to do with length specifications following actual variables not being recognized. This is fixed now.

Claus Fisher sent in three bug reports. One was the failure to fully evaluate an initialization expression (still working on that), the third was the problem of reading a real number instead of an integer followed by an operator which today's fixes, fix, and the last was

SUBROUTINE X2
IF (A.EQ.-2) THEN
H=1
ENDIF
END

which generates an error complaining about an intrinsic unary operator following a binary operator. This error was intended to flag things like A**-B. It turns out that this restriction is only for numeric operators. At this point, I am tempted towards throwing away the current g95_match_expr() subroutine and replace it with something that is more in line with how the standard explains expression composition-- level one through five expressions each composed in various ways that automatically generate the precedence rules. This is opposed to the existing fairly textbook infix parser that is trying to mimic these rules.

Added support for specifying kinds in the nonstandard *<number> format, which it turns out was never a part of any standard. On the plus side, g95 now compiles my thesis project without complaints...

There are a couple patches I have not put in just yet. I will try to get to those tomorrow.

June 15

Worked on applying the patches by Steven Johnson and others concerning the over-greedy tendancy of the real-number matching subroutine to eat its way into an operator in certain cases. I've convinced myself that their fix in OK because I can't think of any legal context in which an alphabetic character can follow a floating-point number. Deep in the standard, user defined operators are prohibited from containing digits.

I am quite tired at tonight, so checking it will have to wait. Hopefully things will settle down soon.

June 12

Started dealing with the backlog associated with the current volume of mail on the list.

Steven Johnson sent in a patch to add a DOUBLE COMPLEX support in type-matching statements. As I mentioned in email, Bill Clodius pointed out this deficiency long ago and I put off implementing it, mainly because DOUBLE COMPLEX is not part of the Fortran 95 standard. Nevertheless, I have included it because it is something that people seem to expect.

Tobi Schlüter sent in a patch to match constant complex numbers. I took part of it and asked for part of it to be rewritten. Support for complex numbers amounts adding more GMP-dependancies at a time when I wanted to start moving away from that, but it seems to me that it enhances g95's ability to parse fortran 77 programs for the moment, which enhances people's ability to test g95.

I've also added Steven Johnson to the CONTRIB file.

There are a couple of definite bugs left in my inbox, but I am out of time for tonight.

June 11

Finished adding parsing for the DATA statement. It looks like it works. Tobi pointed out that the INCLUDE directive replaces itself with the contents of the file and *does not* change the form of the program being parsed. That's fixed now.

The mail list had a lot of traffic today, about a dozen letters. There were several patches, including adding a DOUBLE COMPLEX keyword, which many people expect even though it isn't part of the standard. There was a patch to add parsing of complex numbers. The worst news was how g95 fails to parse

2.gt.1.and.flag  

correctly. The problem is that g95 relies fundamentally on "greedy" matching-- the "1." is read as a floating point 1.0 instead of integer one. Not sure how I am going to deal with this at the moment.

I will look at the accumulated messages in more detail tommorrow night or perhaps Tuesday. G95 is now 17,000 lines long.

June 10

Lots of diverse things today:

Fixed bugs in the data transfer list matcher so that an explicit UNIT= tag is parsed correctly.

Rewrote the next_fixed function so that it uses the g95_next_char functions instead of examining the line buffer directly. This was because the INCLUDE directive plays fast and loose with the line buffer. The previous version was also not quite correct with respect to comments.

Realized that there was a problem in including files formatted in a different form than the current form. The upshot was that the first line of such files would be read in the wrong form. The solution was to move the include logic up a level into the next-statement function.

Added parsing of statement functions.

Started working on parsing a DATA statement. This is the last of the fortran 77 statements that I know about.

In fact, the last couple of bugs have been found by running g95 on programs that are a couple hundred lines long. I've added a new -r option that runs the resolution phase-- this prevents problems with not being able to fully resolve names yet.

June 9

Added a command line option which is meant to be for debugging purposes only. The -v option is 'verbose', and controls the printing of the namespace and code structures. They were printed by default, but now a -v must be given. This is to prevent lots of text from being printed on program units that are just fine.

Tried tracking down the problem Tobi found the other day, but didn't get very far. I think some cleanup has to be done in the area of kind parameters-- this code was written very early in g95's history and there appear to be redundant subroutine floating around...

June 8

I mailed some notes on intrinsic interface checking to Katherine today, a copy should be in the mailing list archives by now. Not a lot of chance for work on g95 today.

June 6

Niels Jensen sent in another bug associated with matching array subscripts which is now fixed.

Wrote a function g95_match_variable(), which matches a variable that can be assigned. Something like this is required for a DATA statement that looks like

DATA A / 6 /  

If this matcher used g95_match_expr, the expression "A/6" would be matched, which isn't correct at all. There are a bunch of other places that will be cleaner with a variable-matching subroutine.

Fixed some serious typos in last night's update; I was pretty wiped...

June 5

Katherine Holcomb, who works with the Legion project has arranged an account for me-- this means easy access to a couple of commercial f90 compilers for when the standard is vague and an alpha platform for testing.

Unfortunately, I didn't have any time for g95 tonight.

June 4

Niels Jensen wrote in with a few bug reports and I've fixed all the one he's found and a few others. I've actually started applying g95 to larger blocks of fortran 77 code, with some encouraging results-- some problems still exist, but there are a lot of statements that are read correctly.

June 3

Implemented parsing of the FORMAT statement. One of my pet peeves about fortran compilers over the years are the compilers that defer the error checking to the library. It really sucks to specify a bad format that causes your program to crash after several hours, just before it prints a result you wanted.

Although I am more experienced now and never make format errors (well, hardly ever), I still think a good compiler should check constant format strings. It took quite a chunk of code to do this, and now there is a new source file, format.c.

Also started work on logic to keep track of labels with a program unit, making sure there are no duplicate labels defined and that labels are used in the correct manner.

June 1

Katherine Holcomb sent back a copy of intrinsic.c with all of the elemental functions marked.

Other than that, not a lot of time for g95. As soon as g95 can parse all of the statements correctly, it will be in the 'larva' state. When we generate code, the 'pupa' state. When g95 is done, we'll see if it will be a beautiful butterfly, or just a big bug....

May 30

More statement parsing. Added the EQUIVALENCE, ENTRY, MODULE PROCEDURE and SAVE statements. This might sounds like a lot, but these last couple of statements generally just sort of save the data without really doing much with it.

On the other hand, we're almost to the point where anyone in the world can run actual fortran programs through g95 and help debug the parser. When we get to that point, I'll start posting binaries on the website so that people won't have to compile g95 themselves.

The DATA and FORMAT statements are all that remain of the legacy Fortran 77 statements. The FORALL, USE and WHERE statements are the remaining Fortran 90/95 statements.

G95 is now 16,000 lines long.

May 29

Implemented the parsing of the NAMELIST and MODULE statements.

May 28

Implemented the parsing of the COMMON statement. It appears to work... I am taking a break from name-resolution questions for a while.

May 27

Spent too much time thinking about a conundrum involving name resolution. I'll post the question to comp.lang.fortran tomorrow.

After that, I thought about what's necessary for intrinsic function resolution. After scaling back the original plan to something simpler (identifying elemental intrinsics), I've passed that on to Katherine.

Other than that, I implemented the BLOCK DATA statement, but not the BLOCK DATA parser.

May 26

Small changes to add_sym() to copy the simplify function pointer into the symbol table being built. I am currently thinking a lot about how symbol resolution has to work.

Even though this is a long weekend for the US (Memorial Day), my mother is coming to visit and I will probably not have a lot of time for g95.

May 25

Did some thinking about how the resolution process will affect symbols, and noticed that the PARAMETER> attribute was a bitfield in the attribute list when it should have been one of the flavors, because a PARAMETER is mutually exclusive with a lot of the other flavors.

Debugged those changes and the changes to intrinsic.c the other day regarding the new simplification member.

May 23

Updated intrinsic.c a bit, adding a new field to the structure that holds a description of each intrinsic procedure. The field is a pointer to a subroutine that will be responsible for simplifying an expression node that holds a call to that procedure. Some of these are required for a compiler and some are optional. This meant a bit of typing...

May 22

I got a good answer from Steve Johnson about my question on va_arg, which read in part:

[...]

This is neither legal nor portable, but fortunately there is a proper way to do it, using the __va_copy macro provided by gcc. This has actually been adopted by the ANSI C99 standard, where it is called va_copy (it would be good to use your autoconf script to check for the right name). (In ANSI C89, copying of va_list variables was simply impossible to to portably.)

The va_copy_ macro was one of Martin Otte's solutions, but I didn't know how portable it was at the time.

Toon Moene wrote in to point out some compilation warnings in the current version of error_print() on his alpha. The easy fix was to replace the single array of saved argument pointers by several small arrays of real arguments-- integers, characters and character pointers. This implementation limits how many arguments can be called in g95_error(), but since this is only within the compiler itself, it doesn't seem like too serious a limitation.

Martin Otte reports that the current version compiles fine on the PPC.

Spent some time debugging intrinsic.c, fixing lots of fairly simple bugs. The function resolution subroutine now calls the resolution for intrinsic functions. So calling a function at the moment assumes that it is an intrinsic functions being called.

It shouldn't be too long until Katherine can resume working on functions to verify the interfaces for intrinsic functions that don't fit the simple model in place now. I think another possible project will be for someone to implement functions that emulate intrinsic functions within the comiler.

May 21

Got caught up in some excitement elsewhere last night and forgot to update the webpage. All the modifications for the resolution phase in existing code appear to be complete. The rewrite of error_print() is also complete. On to new stuff!

May 19

Martin Otte wrote back that my fix didn't work and sent a couple of suggestions. After only one compiled here, I realized that doing anything with va_list other than messing with the va_* functions/macros is a bad idea. So I need to rewrite error_print().

Other than that, I finished resolving the SELECT statement and added resolution for the DO statement and made a start on array specifications.

May 18

Martin Otte wrote in to say that he had compiled g95 under PPC based Macintosh running Linux. The only problem he had was a little illegal manipulation of a va_list type on my part in the error_print subroutine at:

static void error_print(char *type, char *format0, va_list argp0) {
...
va_list argp;
...

argp = argp0;

I've sent him a hack that replaces the assignment statement with a memcpy. I'm not sure how legal or portable this will be... any ideas would be appreciated. I thought I was pretty good with C, but I need a wizard's opinion. If portability is really bad here, we can probably code around the need for the copy/assignment.

After that, I started working on moving type-checking code from the parsing phase to the new resolution phase. The assignment, IF and alternate RETURN statements have been converted. The SELECT statement is taking a lot more effort.

Sourceforge permissions were screwed up again yesterday, but appear to be fixed. Again.

May 17

After much thought, I've decided to resume work on the current g95 rather than try retargeting SGI's compiler for the gcc back end. A couple of other reasons for the decision are:

  • It is always easier to deal with your own code instead of someone else's.
  • The SGI compiler is only fortran 90 with a few fortran 95 extensions. SGI is working on this.
  • Several people besides myself have a good enough understanding of g95 to be able to submit bug fixes and new parts of code.
  • G95 is meant to use gcc from the start as opposed to retrofitting it.
  • G95 is nearly ready to start talking to the back end itself. If g95 were not as far along as it was, I would probably have looked longer and harder at retargeting.
  • Initial progress towards retargeting the SGI compiler would be extremely slow and difficult.

One of the most ironic things about programming is that scientists, who produce some of the worst code ever written, generally spend most of their time adapting someone else's code for their own purposes. As a scientist myself, I've been there and don't like it much either.

Despite the large size of the SGI front end (380k lines) and the embryonic (now larval?) g95 compiler, I think that g95 will end up being much smaller than the SGI compiler because of g95's totally different approach towards reading a fortran 95 source file. The SGI authors opted for the traditional token-based compiler which complicates the lexical analyzer a great deal.

There were some licensing issues regarding g95, SGI-IA-64 and gcc which I thought in the end as relatively unimportant. Anyone interested can follow the banter on g95-develop and gcc mailing lists. My view is that the licensing issues determine g95/SGI-64's "final resting place", as it were and little else.

What I want is a fortran 95 compiler that will work on most of the computers that I will be using over the next couple decades. I am therefore heading in that general direction.

May 15

Big News-- SGI has released the source to their fortran 90 compiler. The license is GPL-2, which doesn't pass the FSF's smell test.

I am not sure how this will affect g95's development. Continue on with the current g95? Jump ship and retarget SGI's compiler to produce RTL instead? (license problems here). Convince SGI to change the license to GPL-1?

We live in interesting times...

May 14

Worked more on making the resolution phase of compilation a reality. Mostly it involved moving stuff around, from the parsing phase to different subroutines happening later.

I started debugging and the first bug turned out to be that the new resolution phase is never even called, and that generated a big question about when to call it. It turns out that it has to be called at the very end of the program unit parsing, after any internal subroutine have been parsed themselves.

The main subprogram has to be resolved first, followed by the internal subprograms. This ordering suggests that we traverse the namespace structures, which are linked exactly in this manner. Not much time for further work today.

The permissions on sourceforge are still incorrect, so files you see will be from the 11th.

May 13

The permissions on the sourceforge ftp server are screwed up again. What you see there is the May 11 upload.

Fixed the subroutines I wrote last friday to handle zero-length argument lists, and also cleaned them up a bit. Added a subroutine to do type and kind comparisons between argument lists and added subroutines that call these subroutines to check intrinsic functions and subroutines. At this point, we should be able to check those subroutines that don't require any special checking.

I got around to thinking about testing this new stuff, and the easiest way seems to be to just hook the intrinsic name-resolution into the big name resolution that has to happen at the end of compiling a program unit. If you've never read Chapter 14 of the standard, it's got the most dense prose of the anywhere in the standard. As usual, I'll implement it simple-but-wrong first and work on improving it later.

I've been thinking more about projects that other people could work on, and one other thing that we could use is implementing functions that do the work of intrinsic functions within the compiler-- ie if certain intrinsics operate on constants, the we should precompute the answer rather than letting it be done at runtime. More on this later.

May 11

The sourceforge guys have fixed the permissions problems on the ftp server-- for the last four days, the world has been seeing the files from the 7th, since I could not overwrite them with new files. The current files are now in place.

Worked on subroutines for matching intrinsic argument lists. The idea is that given formal and actual lists, we sort the actual list so that each element in the actual list corresponds to an argument in the formal list, even if we have to allocate a blank actual argument node for an optional and missing argument.

Once the actual argument list has been suitably massaged, comparing formal vs actual arguments can be done by traversing both lists simultaneously.

May 10

Niels Jensen's patch to get rid of match_dummy() and putting stubs in for the final statement matchers makes g95 over 15,000 lines long. Hooray!

Other than that, I've started writing subroutines to compare actual argument lists with what intrinsic procedures expect.

The problem with the sourceforge ftp is not fixed yet. While it looks like things are fine from the web, the ftp files are a couple of days old.

May 9

Not much time tonight for g95, but I did have some time to think about things. I am going to put the bug I found last night in a BUGS file, and get to it later (or let some other kind soul puzzle over it). Tomorrow I am going to start working on more infrastructure for checking the interfaces of intrinsic function calls.

Katherine and I figure that there are about ninety functions that will be need to be written to check intrinsics that don't follow the table that we presently have. The sooner I can get this done, the sooner she can start working on those.

Niels has a patch pending that I'll get to tomorrow that removes the match_dummy() function and replaces references to it with the stubs for the few remaining statement matchers. Hurrah!

It looks like sourceforge's ftp is still stuck, so the snapshot there is a couple days old.

May 8

Applied several patches sent by Niels Jensen and caught up on a lot of mail (see the mail list). I spent what time I had today fixing segfaults on zero-length files and zero-length include files. Found a few bugs in the scanner. There is still a problem with including a file in fixed-form. The parser peeks at what the scanner is doing, (not a good idea in the first place) and gets it wrong, which means that the scanner should be extended...

There appears to be some permissions problem with sourceforge's ftp directory tonight-- looks like 'ftp' owns all the ftp files. Perhaps someone did a chown -R on huge set of directories.

May 7

Mark Dewing sent a patch that prevented freeing garbage in a CALL statement with a syntax error.

Other than that I worked on a bit of a rewrite of the attribute code. Mostly this involved moving some single bit attributes into the FLAVOR.

Niels Jensen, being a suspicious type, checked a zero length fixed file as input to g95. It failed, and he sent a patch. Being a suspicious type myself, I checked including null-length file... that failed as well.

There are several patches pending (some for a while) that I will try to get to tomorrow.

I will also try to respond to mails sent on the mail list over the last couple days.

May 5

Lots of web stuff today, moving us to g95.sourceforge.net. After a lot of poking around I think that sourceforge will be our best bet for a focal point for the g95 project for a long while. I am graduating from ASU soon and have been thinking about a postdoc in Europe for some time. Given that, finding a more permanent home for g95 was becoming a priority.

Sourceforge has a lot of neat features. It has the mail list that several people have signed up for, the web site and ftp service. For the future, they have CVS (via SSH) and a bug tracking database. Things we probably won't use are the web based forums and the SQL databases.

May 4

Realized that I messed up last night, confusing array references with array specifications. Fixed now.

Neils Jensen and Erik Schnetter both wrote in with minor problems which were mostly fixed. Erik sent a rather large letter to the list, some notes that he had written for a compiler some time ago. I answered what I could on the mail list.

Two major overhauls are pending, variable attributes and deferring type checking until after a program unit has been parsed. Plus my problems with carpal tunnel seems to have returned after being gone for a while, even without a coding binge. I've been typing mostly one handed for two days now, and I may do nothing this weekend in order to let things heal.

May 3

For the past couple of days, I've been working on moving the g95 stuff to sourceforge.net. I almost rejected it when I had trouble finding the mailing list-- I think web-based forums are very clunky and will probably never use it. The list has been up for less than a day as I write this, and I just subscribed myself. To check the subscription, I checked their web page. I got a surprise-- I'm not the first subscriber!

Anyway, I've added a link to subscribing at the top of this page. Web pages should be moved soon.

Niels Jensen sent in a bug the caused a segfault on assumed-size arrays, which has been fixed, along with some cosmetic changes to match.c and the Makefile.

May 2

More work on the mail list... It's coming along.

Niels Jensen sent in a couple small documentation patches, and pointed out a bug in the assignment matching.

May 1

Worked a little bit on intrinsic.c in preparation for letting Katherine work on the subroutines that will verify the interfaces of intrinsic functions. There will be probably quite a bit of code here, but it will not be difficult.

Made some progress on getting our mail list and other things going.

Toon Moene pointed out some problems in header files on the alpha.

The FSF has received and processed the copyright assignments of Tobi Schlüter and Erik Schnetter... This brings our merry band up to four. Welcome aboard guys!

April 30

Tobi Schlüter found a bug that caused PRINT to be handled in an indentical manner with WRITE. He supplied a small patch which fixed things.

Niels Jensen found an unitialized pointer problem that triggered when an IO-list was not present in a READ or WRITE statement.

Caught up on a lot of email today. I am making progress on getting us a mail list.

I make a change worth noting-- I moved the call to the assignment matcher up near the head of the statement decoder. This was in response to bad error messages. For example:

PRINT *, ???  

This failed as a PRINT statement, and matching proceeded to the assignment matching, which tried to match an expression, succeeded until the comma, and then issued an error about a missing primary-- a weird message for a malformed PRINT expression.

The opposite situation is much more unlikely-- a bad message is issued if a really fouled up expression looks like a statement.

Another big change will start happening soon. I am going change how the symbol attributes are stored. Again. The last change was not that satisfactory and looking at the strange behavious of COMMON block convinced me that something better needs to be done.

I'm going to inhibit the uploading of new code for the next couple days, until the source compiles again.

April 29

Finished debugging the READ, WRITE and PRINT stuff well enough to remove the warning about "not debugged real well".

Changed about two dozen syntax error calls to g95_error with a call to a new g95_syntax_error subroutine that takes a g95_statement enum as an argument.

Added the CALL statement to the parser. This required some modifications to g95_match_actual_arglist to allow it to modify the alternate return labels possible in a subroutine call. Though J3 should have gotten rid of these, we'll handle it through the 'subroutines' returning an integer value. Directly after the CALL, we generate a SELECT statement with GOTO statements in the cases, and let the back end sort things out. Had a look at decode_statement and there aren't that many unmatched statements left. Started in on COMMON.

April 28

Niels Jensen found a bug that overwrote statements that had more that one g95_code structures. Fixed.

Talked to Katherine Holcomb today about intrinsic functions. She applied a patch sent by Niels Jensen that fixed a trivial problem in intrinsic.c. She also made other minor changes and sent a link to a neat fortran page where the compare and test several commerical compilers.

April 27

Did some more work on the PRINT, READ and WRITE statements tonight. They seem to work except for the implied-loops, which I have not tested yet. A bunch of the work tonight was adding functions that help generate code structures. g95_append_code is useful for appending a block of code structures onto a possibly NULL head.

g95_io_pointer is a new function that takes an expression node and generates a call node to the proper runtime function that passes a pointer to the runtime IO subroutines. It is very unsophisticated at the moment-- it cannot handle complex numbers, nor derived types. For constants and non-variable expressions an intermediate store also needs to be generated.

Katherine Holcomb sent a copy of intrinsic.c with the intrinsic function and subroutine names typed in. She says it has 162 functions/subroutines names in total. It looks good and compiles fine. The new intrinsic.c is on the source page.

Niels Jensen sent in a patch yesterday that cleans up the Makefile quite a lot (applied) as well as a patch that replaced some test values in arith.c with the IEEE fp limits. He is also considering taking on the job of converting the gmp-based arith.c into something that uses gcc's host arithmetic emulation.

There are a pair of behind-the-scenes discussions going on about I/O libraries and floating-point emulation. Getting a mail list going is becoming an increasing priority. Watch here for more on that.

April 25

Lots of mail today and not a lot of code. I've updated the contributions page to reflect projects that people are working on.

Jos Bergervoet wrote in to report a successful compilation on the HP RISC processor under HP-UX 11.

Niels Kristian Bech Jensen wrote to report failure on:

PRINT *, 'Hello World.'  

Which obviously needs debugging. In my own code, I tend to use WRITE(6,*) statements all over the place, but I may change my style to use the simpler PRINT.

Katherine Holcomb wrote to say that she has finished entering the intrinsic names and subroutines and is now checking things and neatening things a bit.

Toon Moene wrote in about an arcane problem associated with passing a NULL pointer inside of the error-printing routines where a va_list is expected. This has been worked around.

I fiddled a little today with compiling g95 under other compilers, but then realized that I was wasting my time-- If g95 is to be linked with the gcc back end, then the gcc front end is available.

April 24

After grepping for FLAVOR_VARIABLE, which occurred in a total of six places, it turned out that moving the variable attribute to its own bit wasn't that much trouble. We can now assign to a function's return value.

Toon Moene sent some more failures in-- the READ and WRITE statements now fail correctly. I've removed the status messages printed by the IMPLICIT statement, added the RETURN statement.

I also added "#include <string.h>" to error.c because it contains a reference to strlen(). It compiled fine on my x86 linux even with -pedantic -Wall, and I can't find the prototype anywhere. I suspect that gcc is using some internal optimized version of strlen(). In other words, gcc is getting a little too smart for its own good.

Resumed work on the READ and WRITE statements that prompted the recent diversion. After much thought, ended up with a set of functions that should match a complete IO list, including the iterators.

The idea is not to build a special structure, but to generate a tree of g95_code statements, which will amount to a call to the IO-start statement, then a list of expressions, possibly involved in loops and finally the IO-end statement.

Ultimately, most of the IO statements will be replaced with similar handlers. First the existing IO structures will be filled in and then the calls generated with if the statement is syntactically correct.

G95 is now 14,000 lines long.

April 23

Toon Moene has successfully compiled g95 on his alpha. He noted a few problems. The first was the COMMON statement (which isn't implemented yet) didn't fail properly! Some more investigation is required on this one-- the same code fails correctly on my x86 Linux box. Clearly, something architecture dependent is happening.

A second problem with variable references was fixed when I debugged a lot of the variable reference code that I wrote a few days ago.

The last problem uncovers a bad mistake I made when looking at symbol attributes. The problem is that a function name can also be variable name. In the current implementation, a symbol cannot be a 'variable name' and a 'function name' at the same time. What has to happen is that 'variable' is going to come out of the flavor enum and be a single bit by itself. This bit will not be compatible with most flavors, but it will be compatible with a function name. Another implication is that we'll end up with symbols that still have FLAVOR_UNKNOWN, but this doesn't seem like a big deal.

April 22

Katherine Holcomb expressed interest in typing in all of the intrinsic subroutines found in fortran 95 (of which there are quite a few) so I've spent part of the day changing intrinsic.c around so that it becomes more of a typing project. Modified error.c a bit so that we can call g95_internal_error() before we start parsing a file.

April 20

Added g95_show_array_ref() to array.c, updated g95_show_expr() to call it. Added g95_free_ref_list() to free lists of reference structures. Modified the g95_code structure to have two expressions-- an assignment is now defined by two expressions, one of which is an EXPR_VARIABLE.

Started debugging the stuff written over the last few days. Fixed a few bugs and then traced a twice-freed memory problem to g95_match_assignment(), which needed some serious updating. Started on that, but now the check_assignment() subroutine needs to be updated to deal with things (structures) that are not necessarily single symbols.

April 19

Added g95_match_scalar_expr(), which is like g95_match_expr(), except that it requires a strictly scalar expression. Looked at all occurrences of g95_match_expr, and sorted things into one of the two classes. Also added code to reduce_stack() that causes the array-ness of an expression to be propagated upwards in the expression tree.

After that was done, finished writing match_varspec() and modified do_name() to call it. It compiles, but it surely does not work.

Before this can be tested, I need to move the code that shows an array reference (currenty buried in show_expr()) to array.c as a subroutine that specifically shows array references. Other miscellaneous tasks include a subroutine to free a list of g95_comp_ref structures, and add this to the symbol freeing and undoing mechanisms.

April 18

Not a lot of code today, but I got rid of a lot of misconceptions I had about fortran 95 array-valued expressions. The major problem right now is that all of the expression-matching done so far is to match scalar expressions. Modification of the existing array reference matching didn't require much change.

g95_match now regards the %e code and its type-relatives to match only scalar expressions. The %E code matches either scalar or array expressions. This has the advantage that existing code that matches against %e still works. I've decided not to modify g95_match_expr() to similarly distinguish, so all direct calls to this subroutine have to be looked at. There aren't that many of them.

April 17

Started implementing the full matching of a variable.

Tobi Schlüter has put up some notes that we've been working on describing the I/O library here.

April 16

Thought long and hard about changes to the g95_expr structure. In the end, removed the EXPR_SCALAR_CONST, EXPR_SCALAR_VAR, EXPR_ARRAY_CONST and EXPR_ARRAY_VAR enums and replaced them with EXPR_CONSTANT and EXPR_VARIABLE. Array-ness is now a flag in the structure.

This solves the previously mentioned problem as follows: Anything works for the right hand side of an expression, while only an EXPR_VARIABLE expression can be set to something, in an assignment statement, in a READ or however.

The specification of a 'variable' is made more complicated by array specifications and structures. While g95_expr still holds the root symbol, the array and structure specs are held in a singly linked list of g95_comp_ref structures.

The fact that this change didn't break much existing code makes me think that this is the way to go.

April 15

Spent some time revising the specification for the I/O library, at least the part that the compiler talks to. More on this later.

Tried to finish the matcher for the read/write statements, the I/O list matcher. The write list is no problem, since it is composed of expressions. The read list, on the other hand, is composed of what are called 'lvalues' in C compiler lingo-- ie something that is acceptable on the left side of an equals sign.

Right now, the subroutine that matches assignments only takes a symbol on the left. Clearly this has to change and we might as well get it over with.

April 14

Mark Dewing sent a patch that fixed a couple problems associated with fixed source mode. The statement label column is now correctly handled by calling g95_next_char_literal() instead of g95_next_char(), which eats spaces in fixed mode. Mark also diagnosed a problem associated with failing to implicitly check for end of file while in fixed mode.

Tobi Schlüter sent a patch that fixed a problem parsing executable statements following an IF-statement. It turned out that whitespace needed to be skipped before matching each of the executables. Inserting g95_gobble_whitespace() fixed that problem.

The last several days at work I have been learning about autoconf off and on. I suspect that g95 itself will not need much from autoconf, but the IO library probably will.

April 13

My arm is getting steadily better-- I can type, but it still hurts to write with a pencil for more than a couple minutes. Minor update to the parser document.

Added the %Ls, %Is and %Cs targets to g95_match(), which match logical, integer and character symbols respectively.

Also added %Le, %Ie and %Ce which do the same for expressions

April 12

Still taking it real easy on my arm, but it is getting better. Tobi Schlüter has sent in a patch to match the infinite DO-statement which I overlooked. Thanks Tobi!

April 11

My arm still hurts. Worked a little on the matching of read, write and print statements, but going is slow using mostly only one arm.

Oh, almost forgot-- the g95 project has reached a significant milestone today-- some asshole tried to break into the webserver.... Its going to be a little difficult to do that with no CGI turned on :)

April 10

Nothing today. I apparently hurt my left forearm last saturday typing in all that code. Pain is your body telling you to slow down, and I am listening....

April 9

Started in on the READ, WRITE and PRINT statements after fixing some obvious problems with yesterday's coding binge. I didn't get very far and think that it probably has a something to do with getting a little burned out. I'm also not real satisfied with how matching expressions and symbols of a fixed type is done.

Right now the %C, %L and %I codes in g95_match() match symbols that are of character, logical and integer types, while %s matches any old symbol. I think the way to go is to do something like: %Cs for character symbol, %Ce for character expression and so on.

April 8

Wrote about 1,000 lines of code today, and it compiles and appears to work. Added the ALLOCATE, DEALLOCATE, NULLIFY, OPEN, CLOSE, REWIND, BACKSPACE, ENDFILE and INQUIRE statements to the parser. The statements are all pretty similar, and so it was mostly just a lot of typing. Most of the additions are in a new file, io.c

G95 is now a bit more than 13,000 lines long.

April 6

DO statements seem to be working, both block, nonblock and bad do statements. The standard seems a little vague on some aspects of nonblock DO statements. Fixed a bug in match_small_literal_int().

The rest of the statements should be fairly easy to match.

April 5

While copying a large file, xena's kernel paniced in the VFS. Again. And this is with a 'stable' kernel. Grrrr. Right now, xena reminds me of a Red Dwarf quote: "It's more unstable than an Italian taxi driver stuck behind two old priests in a Skoda"...

Forgot the DO WHILE last night, so added it tonight. Added code to display do loops in show_code().

Debugged the do-matcher and finished the preliminary version of parse_do_block(). This one only understands ending a loop via an explicit ENDDO. More smarts for parse_do_block() tomorrow.

April 4

Implemented matchers for the CYCLE, EXIT and DO statements. Started on parse_do_block(). The tricky part here is how to correctly tie up an nonblock ENDDO that looks like:

    DO 10, i=1, 5
DO 10, j=1, 6
...
10 CONTINUE

What has to happen is that parse_executable() needs to watch for statements with labels. When a labeled statement is seen, we check to see if it ends a block, either correctly or incorrectly. If correctly, we return a ST_ENDDO to the caller, which has to be parse_do_block().

Similarly, parse_do_block() checks to see if the statement ends yet another DO block. If so, we return again to a caller that has to be parse_executable().

April 3

Somewhere in the world, Edsger Dijkstra cries out in pain as a GOTO statement is implemented in another compiler...

Added g95_match_label() to match statement labels. Added a '%l' code to g95_match() that matches labels. Both straight GOTO and the computed GOTO are parsed and apparently work. The computed GOTO is actually placed into g95_code structures as a SELECT CASE block. This is the second instance of a theme which will soon dominate g95-- single statements expanding to potentially rather large pieces of code. This is opposed to a language like C, where each statement translates into a small piece of code.

Also wrote g95_match_loop_control() which matches a loop control specification, ie SYMBOL = START, END [, STEP], which will be used in the DO loop as well as READ and WRITE statements. Haven't had a chance to test it yet.

I think the next step is to continue work on the intrinsic fortran 95 statements. Once this is done, it should be possible to run g95 on actual code and have it parsed correctly, even though no object code is generated. The bulk of the remaining statements are I/O statements which all have a fairly straightforward syntax that is reminiscient of a CALL statement with optional parameters.

After that, there are the three allocation statements-- ALLOCATE, DEALLOCATE and NULLIFY, which have simple structures. And finally, the DO loop.

I've also been toying with the idea of putting stable versions of the binary on the web page for people to play with. The main development machine is currently an x86 Linux with glibc2. Any comments?

G95 is now 12,000 lines long.

April 2

Finished the SELECT CASE parser-- the tree structure for detecting overlapping case values still needs to be done. It seems to work. Also put in matchers for the STOP and CONTINUE statements. These seem to work as well. Did some reading on the DO statement. It's going to be, uhhh, uglier.

Also added code to attach statement numbers to code structures.

April 1

Worked on SELECT today, but there is a lot more than meets the eye. The gcc back end stores the choices in a binary tree when building a multiway branch statement, but that only works for integers. We have to handle strings as well.

The first use of the front-end tree is to prevent duplicate cases. Since the back end only does integers, the front end tree will also be used to generate code for a string select. While the back end will usually generate a jump table, we will need the tree to generate a series of tests and jumps. By specifically using the tree structure, a select with O(log(n)) search time can be generated as opposed to O(n).

The fortran SELECT statement also allows indefinite lower and upper bounds to be specified. This doesn't appear to be supported by the back end, and will probably be handled by effectively wrapping the select in an IF-statement that tests those bounds.

Anyway, the basic SELECT and CASE statement matchers are done, without creating a tree. A lot of work went into simplifying the parsing subroutines-- there is now a parse_executable() subroutine which does nothing but string executable statements together and also calls the IF/SELECT/DO/FORALL/WHERE parsers when needed.

This simplifies the block parsers as well because they generally have their own special statements that they expect interwoven with executable statements.

Debugging the SELECT/CASE statement matchers also revealed a lot of problems with undoing/cleaning up after previous statements.