2000 Archive |
| December 31 |
Lots of small fixes. Fixed the problem with functions introduced last time. The problem has to do with creating only one symbol node for containted module functions. In the contained module the function is a variable and in the module namespace the function is really a function. Another problem was the parsing of a function call of the form f(a==b). The argument-matcher wasn't discarding the 'a=' like it should have. Fixed a problem with the formal arguments of statement functions being declared as 'SAVE'. Solved the problem by not applying the 'DUMMY' attribute to entities in these arguments. |
| December 20 |
Looks like the fix to the function declaration parsers had some problems. No time to fix it today, as I am preparing to head home for Christmas. My brother has a DSL connection, so I may be able to do some stuff done, but I'm not going to count on a lot. I should be back in Arizona around the 30th. |
| December 19 |
Last night's fix fixes loads of things-- LAPACK now mostly compiles. Fixed another problem-- Generic names have to be given the 'function' or 'subroutine' attribute, depending on the type of procedures that their interface contains. Also added error messages for generic interfaces that contain a mixture of subroutines and functions. A while back, Michael Richmond reported a problem having to do with what attributes a function has in it's own program unit. Unless a result variable is specified, the function name is really a 'variable'. This initially led me to an important fix in g95_match_actual_arglist()-- Actual argument lists are the one place where procedure names can be used as expressions of a sort, with a typespec of BT_PROCEDURE. Anyhow, the problem appears fixed. |
| December 18 |
Fixed the function g95_match_optional(), which was adding the 'intrinsic' attribute. This was causing problems in a module created by the LAPACK library that preventing compiling everything else in that directory. |
| December 17 |
Fixed a few problems left over from yesterday that I missed-- this included trying to do numeric operations on expression nodes that didn't even represent constants. Fixed a problem in the matching of structure constructors-- the components weren't being copied if they came from a parent namespace. g95_charlen structures are now written and read in modules. |
| December 16 |
Finished the modifications and hooked the new expression subroutines into the expression parser. As far as I've tested, things are back to where they were. I've checked in the changes to CVS in anticipation of the automatic test suite running later tonight. Sourceforge has upgraded their hardware, and secure-copy still doesn't quite work... I've also made some modifications to the contributions page to reflect the FSF's new assignment procedures. |
| December 15 |
Not much time tonight, but more work on the overhaul. The only things left to do are to remove the current scheme for simplifying expressions and hook the new subroutines into the main expression parser. It looks like this new scheme is going to be even more general than I had previously thought-- the stuff I am doing now will also have to handle simplification of intrinsic expressions, even those with array arguments. |
| December 13 |
Worked more on the arithemetic overhaul. Almost ready to start testing, then I can delete some cruft that has been around since day one. I did get an answer from the FSF regarding their new assignment procedures and have mailed Walter back-- he was thinking about implementing the overlap checking for SELECT CASE statements. I'll try to update the web page sometime tommorrow. I am going skiing, so it may not happen... |
| December 12 |
Whoa! A whole week just slipped by! What can I say-- I've been busy, trying to get things done that have to get done. Walter Silvestri let me know that the link to the copyright assignment forms is broken. As far as I can tell, the forms are no longer available on the web. While sane people would wonder why an organization devoted to free software would hide the legal forms needed to contribute to that organization, the FSF is a more than a little given to elitest lunacy. For example, check this out, third paragraph. Anyone who gets upset about personal pronouns having a gender probably has other serious problems as well. Anyhow, I am working on the situation re assignment forms. G95 stuff tonight consisted of writing down a list of restrictions that the F languages places on fortran. This is part of a revamping of the contributions page which has been allowed to languish too long. Walt Brainerd sent me a list of these a while back, which I lost, but a more comprehensive list was on his website. |
| December 5 |
Still rearranging things within arith.c. The changes I've been working on basically reorder the simplification of expressions. It is not that important to be able to simplify every possible expression-- the back end takes care of collapsing constants in different parts of an expression tree and there is no sense duplicating this in the front end. One thing that the front end has to be able to do is to reduce an expression that is composed completely of constants into the right constant at compile time-- to determine the value of a PARAMETER, for example. In the previous version, the expression parser built a tree of expression nodes. Later, a recursive simplification function was called that could reduce a constant expression to the right constant. In an initialization expression, function references bind to intrinsics automatically unlike function references anywhere else. The simplification function had a flag to attack functions in this manner. The new versions switches the order a bit. When the expression parser has two summands that it needs to combine, it will call g95_add(op1, op2) which will return a new node that is the sum of the two nodes. If the two nodes happen to be constant, the new node will be the arithmetic sum of op1 and op2. Later on, if the expression is an initialization expression, a stripped-down version of the current simplification function will call the right intrinsic function handler to do its thing. While it sounds like an aesthetic rearrangement, this will make arithmetic a lot easier to do, particularly within the functions that reduce intrinsic expressions at compile time. As Katherine Holcomb found out, the g95_arith_* functions are a pain in the butt to use. One particularly painful place is array constants created from array constructors. Under this scheme, you just call the g95_add() and it notices that it is dealing with arrays and calls g95_arith_plus() repeatedly to do its job. Under the new scheme, expressions can be constructed and reduced using the same functions. While I've been planning this for a while, the real impetus was noticing that the bulk of the problems in the test suite are a failure to reduce initialization expressions. |
| December 4 |
More work on the expression node overhaul. The changes aren't that major, but reflect some fundamental changes to parts of g95 that have been around since the early days when it started out as an expression parser. More explanations later-- it is quite late. |
| December 2 |
Andreas Schweitzer reported an internal error associated with freeing the IOLENGTH form of the INQUIRE statement. The problem has been fixed. Rob Cermak pointed out that module file are being written from modules that have errors. This is a very bad idea, since it will fool the 'make' program into thinking that a source file does not need to be recompiled. I've changed things so that a module file is not written in this case and a previously existing module file is deleted. Started overhauling the expression handling to deal with array constants and make intrinsic arithmetic easier. |
| November 30 |
Rob Cermak reported that last night's fixes to the KIND intrinsic caused a huge jump in the number of source files sucessfully parsed by g95. Mark Dewing posted the URLs of a couple of perl tools that can be used to create makefiles by reading directories of fortran 90 source files. Hopefully this will improve things even more. I came up with a third way of last night's dilemma. Instead, I've opted for simply copying the component lists of derived types when the proper type is in a parent program unit. This in effect defines a separate but equal type. Added a couple more diagnostics to the parsing of derived types-- now, you can only define a type once! Also added command-line options -ffree-form and -ffixed-form. This came up on the mail list the other day. They cause the source file to be parsed as fixed or free form without regard to the filename extension. |
| November 29 |
Fixed several minor problems found by the test suites. These included a core dump in the KIND() intrinsic. I've also changed structure I/O to generate placeholder code instead of an internal error. Generating code to print structures is going to have to wait until there is more machinery for generating code... The major dilemma of the night was the forward referencing of types-- There was a special case to allow functions to be declared of a derived type before the type was defined. It turns out this special case is actually the rule. Variables of a derived type can be declared before the type itself. With host program units this creates a problem. If a derived type variable is declared, then what does it refer to? If the type is defined later, then that is what is used. If no type is defined, then it takes the type defined in a parent program unit. We've got to create the symbol node for the type when it is first used, since those typespec structures have to point to something. But it might turn out later to be wrong. Two possibilities suggest themselves. Either we make a pass through the entire namespace when the first non-declaration statement is encountered and update typespec structures to point to the right thing, or we write an accessor function that returns a symbol node given a derived type typespec node. At the moment, I'm kind of leaning toward the function or maybe just inlining something, since there aren't too many places that need to go from typespec stucture to symbol node-- match_varspec() is the notable exception. |
| November 26 |
Made changes on where procedure names are stored-- they are now generally stored in parent namespaces. Debugged this and things look good-- the RK code now parses without any errors at all. Erik Schnetter reported a problem with name matching-- only 30 characters were matched instead of 31. |
| November 23 |
Debugged the saving and loading of interfaces within a module. It appears to work. The real depressing thing about working on this is the realization that almost no one will use these dark corners of fortran 95... |
| November 22 |
Added private and public attributes to operator interfaces. These control whether these definitions are exported to a module or not. I also changed the public and private bits in the symbol_attribute to a single bitfield-- this lets us easily test for the ACCESS_UNKNOWN instead of requiring that two bits both be zero. I've run across a case where it will be necessary to save PRIVATE symbols. Consider: module a Because g is exported with the module, g1 can be executed, even though it is not "accessible"-- "call g1" gives an error, while "call g(.TRUE.)" links and runs as expected. What will have to happen in this case is that the local name of g1 in the new namespace will be something that is illegal. |
| November 20 |
Checked in a couple of bug fixes. I've added an option from g77, -fdollar-ok, which allows dollar signs in entity names. |
| November 19 |
The g95_interface structure has been eliminated, and interfaces are now handled by linking the g95_symbol nodes together in lists. The changes were not that extensive and have been debugged. Did some work on handling the PUBLIC and PRIVATE attribute statements within modules-- these attributes alter use-associated symbols. The larger change is that the access mode is not saved to symbols in a module-- in a situation where a module is used by another module, private symbols don't make it into the second module anyway, and their future accessibility depends on the second module. Support was also added to allow a name defined by a MODULE PROCEDURE statement to be a function name (subroutine support is already there). I've patched Katherine Holcomb's work on intrinsic.c so that the selected_real_kind() intrinsic always returns the default real kind for now. This will allow lots of code to be parsed without error. |
| November 15 |
Katherine Holcomb has checked in a large patch to intrinsic.c that is a first stab at implementing intrinsic functions, at least within the compiler itself. A problem with my thesis project is currently interfering with progress on g95. |
| November 13 |
Started an overhaul of how interfaces are handled. In particular, it appears that the g95_interface is not strictly necessary-- I think interfaces can be done by linking symbol nodes together without using another structure. No checkin tonight, and probably not for a few days. |
| November 11 |
Lots more bug fixes all over, with the idea of getting the problem count in the regression tests down. Added reference counting to symbol nodes. |
| November 10 |
Lots of bug fixes in diverse areas, fixing problems found in the regression tests. Once the error count is way down on these, it will be easier to spot a new problem that has been introduced inadvertantly. |
| November 9 |
Worked yesterday and today on interfaces. This is a digression from modules in the sense that interfaces and such should be fully supported before they are saved/restored. One thing that has become clear is that symbol nodes are going to have to be able to reside in more than one namespace. For example, a name associated with a module procedure has to live in the module's namespace because all of the contained program units have to be able to find it. It also has to reside in the subprogram's namespace because that name cannot be reused within the contained namespace. The same holds for contained program units within program units. The upshot is that reference counts are going to be needed in symbol nodes so that they can be correctly freed. This is also necessary for symbols that are reference more than once through use association. |
| November 7 |
Watched the election tonight. I am something of a political junkie although I didn't have time to back anyone in particular this time around. |
| November 6 |
Applied a few more patches sent by Niels Jensen relating to matching deleted features. Started work on improving handling of interfaces-- nothing is checked in yet. |
| November 5 |
Applied patches sent by Niels Jensen that match some of the deleted statements of fortran, specifically the ASSIGN statment, the assigned GOTO and the H descriptor. |
| November 4 |
Worked on fixing some recurring problems with the format checking that one of my f77 test codes kept complaining about. The problem was one we've worked on before-- vetting formats that are strings (not in format statements). I essentially changed the code back to the way it was, which is to say reading from the string instead of the source file. Reading from the source file allowed printing an error locus, but had a couple of problems. The first was that format strings can be calculated by concatenating several strings together. In this case, our read-the-source method failed. The second was that it was complicated to convert the source file to a string, getting the escaped character right and so on. So I switched things back to they were originally. The downside is that there needs to be a better error message for such strings-- we can't use the usual error reporting mechanism to highlight the problem. I also worked on fixing some problems with the scanner. Tobi added some code a while back that ate end-of-line comments in fixed mode. I've added some analogous code in free mode. I also strengthened the requirement for continuation lines in character contexts that the '&' be the last character on the line in this case, per the standard. |
| November 3 |
Worked on implementing the 'ambiguous' bit. After some thought, I ended up making g95_get_symbol() more elaborate, rather than trying to remember to check for ambiguity every time it is called. The difficulty was that g95_get_symbol() returns a symbol and the ambiguous bit has to be stored in the intermediate symtree. An entity (symbol) can have an ambiguous reference to it, but there could also be a perfectly clear reference to it by another name. Since the ambiguous bit has to be stored in the symtree, the logical time to check it is when we are searching for the symbol itself. This then caused the problem that g95_get_symbol() must be able to return a failure condition-- before it always worked and returned a symbol node, even if it had to create the new node. This meant going through and changing all the places where it was referenced. I've also added a use-associated bit in the symbol attribute structure-- there are function resolution rules and other conditions that treat use-associated variables different than other variables. In particular, the attributes of use-associated variables can't be modified after the USE... |
| November 2 |
Fixed more bugs in module reading and writing. On the first day I started g95, I downloaded a Runge-Kutta code that was about 16k lines long. About two months ago, it parsed after editing the modules out. Now it parses without any help. The only thing that is USEd is an integer parameter that determines the overall kind. But it works! Just for fun, I checked out how some other f90 compilers do on this RK code. The module file left behind by g95 was 44k long. I figured that an ascii format would be less efficient than a binary one, but rationalized that disk was cheap. The IBM xlf90 compiler left behind a module that was 440k long. SGI f90 crashed with an internal error, as did PGI. Compaq fortran left a 160k module. To be fair, g95 is not writing everything it needs to, in particular interfaces. On the other hand, I can't see the module expanding by more than a factor of four, so g95 modules would appear to be efficient with space. And they could be made smaller without a lot of trouble. I've also updated the binary. I've added a link to Rob Cermak's g95 regression results page at in the "links" section. This will make it a lot easier for me and others to find the page. Rob posted an update on the mailing list that is worth repeating: Additions: |
| November 1 |
Fixed a couple of module-reading bugs. G95 can now read simple modules. |
| October 31 |
Rob Cermak let me know that Ben Turner has been the one tracking down all of the fortran 90 packages mentioned in the emails of several weeks ago. Thanks Ben! I was reading an article yesterday that appeared in the Physical Review of 1955. The article was about a molecular binding calculation that is relevant to my thesis. To my great surprise, the author acknowledged "J. Backus" of IBM for his help in programming the IBM 701 used in the calculation. This was the time that he was actually involved in writing the first fortran compiler, although my source on the history of fortran says that it ran on the 704... As far as g95 is concerned, the subroutines that read a module are now called by the parser and debugging has started on reading. |
| October 30 |
Tobi Schlüter sent a patch that allows the module subroutines to operate on files in directories besides the current. I also checked out Rob Cermak's automatic regression results. He's added quite a lot of fortran 90 programs that are checked nightly. I've added most of the rest of the code necessary to read a module. It compiles but is not tested at all yet. |
| October 29 |
I spent the first part of the day untangling the g95_array_spec structure from the symbol and component structures. Now these structures point to a g95_array_spec structure instead of containing it. More work on modules-- checked in lots of changes. Simple modules (ie just variables) are now written correctly. |
| October 28 |
Lots of changes today. The module-writing subroutines are now called from the top level after a module has been parsed. A simple module consisting of two reals doesn't quite work yet, though. I'll try and check things in tommorrow when it has a better chance of working... |
| October 26 |
No code today, but lots of thought on how the symbol table should be structured so that it will work correctly. The file 'modules' has been checked into the doc subdirectory. Stan Whitlock of Digital, who I met at J3 mentioned at one point that they had to rewrite their implementation of modules three times. I can see why... |
| October 25 |
Started working on top-level read and write subroutines for modules. It's clear that some more thought needs to go into what has to happen. Michael Metcalf mailed me about half a meg worth of test cases today. I've written an article for his Fortran Forum magazine about the evolution of free fortran compilers that should appear in the next issue. |
| October 24 |
Modified symbol table subroutines for handling modules. Also checked in the rather massive changes that have been made to module.c over the last week. I've also added an option for parsing the F subset of fortran 90/95. |
| October 20 |
Added reading and writing of GMP integers and floats. The code is still not checked in yet. Instead of finishing the rest of the low level stuff, I am going to finish the high-level things first (ie saving whole namespaces) so that some debugging can start to take place. A couple of days ago, I started work on a new program that does a calculation that I've been wanting to do for a while. It's complicated enough that fortran 77 won't cut it-- I really need structures and recursion to tackle this problem, so I am currently writing my first program in fortran 90. It was either Kernighan or Ritchie that said that the best way to learn a language was to write programs in it. Writing a compiler is not a bad way either.... |
| October 18 |
More on reading modules. Added reading/writing of iterators, constructors lists, various constants. Still lots of things to be done here. |
| October 16 |
Ok, Back. A number of problems were resolved with my "day job" this morning, and I can feel a g95 binge coming on. I spent some time tonight working on modules. There are sure a lot of data structures that have to be loaded and saved. Most of them are now coded, but the worst, the symbol nodes themselves are going to be last and will involve some changes in how they are stored in a red-black tree. Nothing has been tested yet (or checked in) and this is mostly due to the fact that I/O with symbol nodes will ends up happening first when a namespace is saved. The interlinking of all these different subroutines also tends to make this an "all or nothing" sort of proposition. |
| October 11 |
Disposed of a bunch of bug reports. Fixed g95_match_init_expr() so that it would recognize constant structure and array constructors. |
| October 8 |
Tobi Schlüter added a -I option that specifies the directories that g95 should search when looking for files. Worked more on modules, making an initial checkin. It compiles, but nothing is actually called yet. |
| October 7 |
Worked on modules today. No checkin yet. Souceforge seems to be having troubles accepting scp connections.... |
| October 6 |
Tobi Schlüter fixed a bug that prevented common blocks from being present in interface blocks. It took me a while to convince myself that this is legal but as far as I can tell it is. What use is a common block inside an interface block? Tobi also fixed a typo reported by Vikram bir Singh. After a couple days of slacking, I have started on the subroutines needed to read a module. Several subroutines that are now involved in writing the debug information will probably be moved from symbol.c to module.c so that reading subroutines can be right next to writing subroutines. |
| October 3 |
Reworked the parser's handling of end of file. Michael Richmond pointed out a problem in this area a while ago when an error was not being generated. The parsing subroutines used to return a flag that indicated whether an unexpected EOF was found by a callee, and this value had to be propagated up the stack. Now we just longjmp() out of trouble-- the unexpected EOF is a show-stopper as far as the compiler is concerned. |
| October 1 |
Rob Cermak has gotten his automatic test suite started. I checked out the first couple of problem reports. Several of the tests were not correct, but there were also a few problems found, some of which had to do with yesterday's changes to function representation. Fixed a few problems found by Michael Richmond some time ago. |
| September 30 |
Finished and debugged changes to how functions and procedures are stored within symbol nodes. Hopefully things are a little close to "right". I am getting used to using CVS at home. Michael Richmond sent a mail a while back reporting that a SEQUENCE statement inside of a type declaration was improperly flagged as an error. The PRIVATE property was incorrectly flagged as well, so I am thinking I just got things backwards in my mind at the time. Also greatly expanded the number of conflicts that are detected. Rob Cermak requested that g95 return error codes for regression testing. The codes are:
0: All went well He also sent a patch to print the number of errors and warnings at the bottom of a g95 run. This is especially useful if the -v switch is used. Bill Wendling suggested a random fortran program generator for testing that Mark Dewing promptly implemented in python. The program can be found here with a template file here. The basic idea is to start with a template file that generates lots of pseudorandom fortran program that can be tested overnight. |
| September 27 |
Finished the symbol documention, it is located in doc/syms in the depository. I modified the source to conform to the document, the biggest change is how functions and subroutines are represented. The code compiles, but it probably does not work. |
| September 25 |
Worked more on the symbol document, it will be ready for prime time soon. I've discovered some misconceptions that I had about symbols. For instance, if a name is a defined operator, it can be anything else at the same time... While it would have been nice to get this right at the start, one is not aware of all the issues at the start. In other news, the FSF has received a copyright assignment from Michael Richmond... Hooray! |
| September 24 |
Michael Richmond sent a bug in regarding a problem with dummy procedures. Since this is the third or so time that I've had to try and get this right, more thought was clearly required. The problem of representing dummy procedures (vs real ones) led to the realization that the g95_symbol structure has started to get out of hand. I've therefore started working on a document that describes this rather central structure. |
| September 23 |
I've activated Sourceforge's CVS depository and imported the current sources into it. The direct link to the CVS archive is here (the main menu has also been changed). Also fixed a broken link to the J3 website. |
| September 22 |
Driving into Las Vegas at night is a treat. Phoenix is not so great-- there are too many hills in the way for you to see any of it before you are actually in it, but Las Vegas is different. Approaching from the southeast, you can see it about seventy miles out, a glow over intervening hills. You can tell it is Las Vegas by the searchlight built into the pyramidian of the Luxor Hotel. After passing over Hoover Dam, you go down and around on a twisty road, eventually coming to a pass. After coming up a small hill after the pass, you can suddenly see the whole place, all lit up, spread out over a huge valley. Finding the hotel was no problem, since it was located close to the strip, and that was easy to spot from up on that hill. I had a little problem registering, since they managed to mangle my name in a way that I actually haven't heard before. The hotel was a nice place-- lots of room, and a free breakfast in the morning. From the closing business, I understand that J3 is going to have several more meetings there. The meeting itself had about sixteen attendees. About a third of the members were from vendors-- Compaq, Sun, Intel, Cray, HP, and NAG. The other members were from all over-- NASA, JPL, a couple university people and other companies. The process of creating a new language specification amounts to writing a large and complicated book. People who want a new feature, or just to clear something up propose a "paper", which is given a serial number and written up under that number. The paper goes to a subgroup that has to pass the paper by vote before it is reported back to the full committee. At the last meeting there were four subgroups-- "data", which deals with the main f2k language issues, "interop" which is currently finalizing the interoperability with the C language, "interp" (interpretation) which interprets the holes in fortran 90/95 and tries to clear up f2k as appropriate. Once a paper has been passed by a committee, the author finalizes any edits and puts printouts on the table at the end of the day for people to read for a vote the next morning. It's a grueling schedule. Meeting and talking all day, reading and writing half the night. I heard several complaints from people who were unable to satisy their urges for compulsive drinking and gambling. Most of the committee had already heard of the g95 project and I let them know where we stand at the moment. Several of them wished us luck. I talked little about how g95 works internally. The guy from Sun asked me how many compilers I'd written before. I had to say "Uhhh, none, this is my first one". I've also absorbed enough fortran 95 to be able to understand most of what was going on and even contribute in small ways and even participate in a few "straw votes", which are nonbinding votes used to get a sense of where people stand on issues-- they are the main reason that most things pass by unanimous consent. The discussions were very amicable. Unlike a lot of other gatherings, the people there were willing to be persuaded. It was explained to me that corporate representatives are occasionally required by their companies to vote a particular way, but my impression is that the companies mainly want a representative just so they know where the leading edge of fortran is, as opposed to defining it. As far as I could tell, everyone was there because they wanted to be and were all interested in making sure that f2k is as good as it can be. As far as technical issues go, f2k is becoming a huge language. People were saying that f2k is to f90/f95 as f90 was to f77. New thing include polymorphism, constructor and destructor functions, user definable data transfer functions that are called during I/O depending on what is being output, stream I/O (no records) and interoperability with C. I'm probably missing things that weren't discussed at that particular meeting. But there is a lot of stuff here. Another thing I got out of the conference was a better understanding of the idiosyncratic nature of fortran. For example, in C, you can do something unexpected like: *strchr(string, "x") = '\0'; because strchr() returns a perfectly good character pointer, which can be dereferenced by the '*'. In fortran, the expression (/ 1, 2, 3, 4 /) is an array constant, but you can't write something like do i=1, 4 because subscripts are only allowed after a named variable instead of any old array expression. And there are a lot of similarly weird restrictions. What has happened is that the language has just had a lot of things stuck on it gradually, unlike the evolution of C before it became standardized. I had a really good time and will head back when I can. |
| September 19 3:00pm MST |
Michael Richmond pointed out some debug code left in the format parser, which has been removed. There will be no updates until the weekend. The car is gassed up, I'm packed and heading off to the Fortran J3 meeting in Las Vegas. I'll give a report of some sort when I return. |
| September 17 |
The long standing problem of correctly pointing to errors in long format strings has been fixed. Format string, whether they appear in FORMAT statements or string constants, are now read directly from the source file. This allows an accurate error locus to be used. |
| September 16 |
Fixed the last of the problems that Michael Richmond reported a while ago. Also overhauled the ENTRY matcher, changing how an ENTRY is stored. Michael also reported a problem that dealt with alternate return labels in a CALL statement were not causing a target label to be marked as "referenced". This has been fixed. Dan Nicolaescu reported a core dump that was caused by copied code that converted a real to an integer. Updated the link to the X3J3 committee's website. They're at www.j3-fortran.org now. |
| September 14 |
Michael Richmond sent in fix for the crash reported the other day by Ian Watson. The problem was with: SUBROUTINE FOO(I) The parser complained about a premature end of file, and tried to use the %C code, which signals the error print subroutine to insert the current locus. The problem was that the current locus wasn't pointing anywhere. Now we just print the filename of the offending file. |
| September 12 |
Finished separating the matching of array references and comparison of the reference to the specification, started debugging it a bit. I'll try it out on the code that actually caused the separation tomorrow. |
| September 11 |
Boy, two-year-olds can sure be a handful... Michael Richmond sent in an email a week and a half ago detailing some problems. The first dealt with how a name that was marked as EXTERNAL was interpreted. We actually have to check the next character following the name to see if it is a function call or a procedure variable. The other problem he pointed out is causing a lot more work. I had thought that an array reference could only follow an array specification. This isn't quite true. The counterexample is: EQUIVALENCE (NX(1), X) This means that matching an array reference has to be decoupled from comparing an array reference and an array specification... |
| September 7 |
Unfortunately, nothing on g95 for the next couple of days. I am going to visit my nieces and nephew this weekend. |
| September 6 |
Started working on raising numbers to integer powers. This required a generalization of g95_int_expr() to create expression nodes of any numeric type. The new function is g95_constant_expr() and replaces g95_int_expr(). |
| September 4 |
Tobi Schlüter sent a patch that fixed an incorrect error when making sure the ADVANCE tag wasn't mixed with list formatted io. |
| September 3 |
Tobi Schlüter sent a patch that fixed the KIND of a complex number when one of the components was a literal that looked like an integer. Normally I deal with mail in more or less chronological order, but patches get priority. Bill Wendling sent a patch that fixed a fall-through in the switch() that controlled resolution-- the case for DO resolution was falling through to the ALLOCATE resolution. Dan Nicolaescu pointed out a few days ago that trying to print a variable of type COMPLEX failed with an internal error. I've fixed this, but the corresponding fix for derived types is going to have to wait until we get into code generation. Jos Bergervoet sent in a problem with the RESULT keyword in function declarations. The return code of the subroutine that actually set the 'result' attribute was incorrectly checked. He also pointed out that array constructors weren't being parsed correctly. This was due to setting the wrong expression type and also uncovered the problem of not freeing the constructor itself. All fixed. I've also taken the time to split expr.c, which was becoming quite large into two smaller pieces. The first piece is a new file, primary.c, which takes care of matching primary expressions like integers, reals, complex, logicals, array constructors, etc. The second piece consists of the remaining functions in expr.c, which handle lots of non-matching expression things like allocating, freeing copying, resolving, simplifying and converting. The other task I started on was a cleanup of the prototypes in g95.h. It's been a long time since I've done this and a lot of the prototypes were out of order, in other source files or just plain no longer used. I only got through symbol.c, and there are still probably 2/3 of the source left. In short, lots of changes all over today. |
| August 31 |
Michael Richmond wrote in three days ago to point out some problems, some of which have already been noticed and fixed. From those that weren't he noted that g95 warned if (insignificant) spaces were being truncated when a line is read. I've changed this to warn only if nonspace characters are seen. He also pointed out that ENTRY names within a FUNCTION are not given the "variable" attribute that they need (and I'd guess that the RESULT keyword parsing probably isn't there either). It's also obvious that more work is going to be needed on the ENTRY statement. |
| August 30 |
Fixed the (a.or.b.and.c) problem first reported by Dan Nicolaescu... It took about ten minutes of staring alternately at the code and the standard, and the fix involved replacing an "and" with an "or". The last problem had to do with a code fragment that looked like DO 20 J=LL,L4 As far as I can tell from the standard, the GOTO statement is not allowed to be a part of the statement that terminates the nonblock DO-loop. If someone knows otherwise, let me know. I also had a chance to look at Katherine Holcomb's progress on adding to intrinsic.c. She's got range-checking done for a few of the intrinsics, and is also working on type conversion intrinsics like CMPLX. It looks good so far and she had a couple questions that I answered (probably should have cc'ed it to the mail list). |
| August 29 |
Jos Bergervoet reported a successful compile of g95 under Solaris (64 bits!) and HP-UX and pointed out that I forgot to mention which version of gmp is/will be needed. Marc Dejardin pointed out a problem with my INCLUDE fix the other day-- I included the case preservation to the keyword itself. Internally, the keyword is stored in lower case, so an upper case INCLUDE line was not found.... Dan Nicolaescu sent two more bugs, one of which I fixed. The other problem has to do with parsing a logical expression of the form if (a.or.b.and.c) stop I didn't have time to fix this tonight. I am currently about three days behind on g95-related mail. Katherine Holcomb sent an update to intrinsic.c that I'll look at next. |
| August 28 |
Alaeddin Aydiner wrote in pointing out some problems-- g95 didn't correctly back up when a match with an I/O list failed to be matched, and there was a problem with array matching (and, as it turned out, printing). Martien Hulsen found a problem with the INCLUDE statement-- it was folding case, thereby mangling the filename into what could easily be a different filename. This problem is also present in MODULE names and the USE statement-- some special handling is needed for these "symbols". He also pointed out that the matcher for iterators would not accept a space after the start expression-- That problem has been in there for a while. I've made the iterator matching much more robust. Dan Nicolaescu reported another subtle problem: REAL*8 Z(M) Complained about duplicate application of the integer attribute. The problem was that g95 was assuming that untyped variables could be given their default type when first seen. This usually works, but the above program provides a counterexample. Dan has actually being parsing spec95 with g95. He reports that we have under a dozen distinct errors left. |
| August 27 |
Dan Nicolaescu wrote in pointing out a problem with my fix-- the complex constants were still being matched too agressively. I added code to the error handler to allow pushing and popping of error messages so that a matching subroutine can generate its own errors and still decide to return MATCH_NO. |
| August 26 |
Bill Wendling sent a patch that adds parsing and printing of substring constants and cleaned up a few comments. Thanks Bill! Laurent Klinger reported three bugs-- symbolic constants inside of complex constants were not handled correctly and namelist variables on the right hand side of a NML= tag were not recognized for what they were, both of which were fixed. The other problem he found was that symbols are not being created in the right namespaces... this is a big problem which I am going to wait on. Michael Richmond reported a couple problems that have been fixed: Tabs were not always being expanded in the statement label region for fixed-form source, the blanket form of the SAVE statement was not implemented at all. He also was the first to point out a problem with the parsing of the data-transfer statements: READ(10,100) X The problem is that (10,100) is a perfectly good complex constant, and was being parsed as a unit number in the form of the READ statement without the I/O control list. This worked fine when I originally implemented the I/O matchers, because matching complex constants was not implemented...:). The data transfer statements are now handled the same way as the other I/O statements-- first we check for a '(' and a subsequent control list. If the '(' is not found, then the alternate form is checked for. Michael also pointed out that a substring reference of an otherwise unknown symbol ran into problems because g95 decided too quickly that the substring reference was really a function reference. This is now fixed, but I suspect the algorithm for deciding what an otherwise unknown symbol is may have to undergo more revision. Also fixed a problem he pointed out regarding FORMAT and ENTRY statements that preceded the specification statements. The last problem was requiring another edit descriptor after the 'P' descriptor. I've also added a -ffixed-line-length-80 option which duplicates the functionality of the same g77 option at his request. Added resolution functions for subtype references and array references. Added several constraints relating to the data-transfer statements. Added an optional warning that lets the user know that a source line has been truncated due to being too long. Dan Nicolaescu found a problem with the error printing in the format-checker, which is now fixed. He also uncovered the fact that g95 doesn't regard a constant raised to an integer power as an initialization expression. Dan also sent a problem regarding a single-statement IF-clause. The clause in question had a PAUSE statement as its action clause, which has been removed in fortran 95. The real problem was a horribly misleading error message that has been fixed. Marc Dejardin pointed out a bug in how g95 handles comments. Fixed g95_match_eos() so that it eats any trailing comment in the current line that it is parsing. The comments are slightly tricky-- in free mode, a 'c' at the start of the line does *not* start a comment. G95 is 21,000 lines long. |
| August 23 |
Laurent Klinger, Dan Nicolaescu and Michael Richmond were kind enough to send in a couple of bug reports. Laurent reported a successful compile of g95 on a Sun Ultra3 running Solaris 2.6. I fixed a problem noted by Dan having to do with the IMPLICIT statement. He and Michael both noted problems with the READ statement-- I looked at the code, and somehow I deleted the part that looks for a unit number, no doubt a couple days ago when I was messing with the ordering of parsing IO tags. I haven't had a chance to deal with all these reports yet, but I will get to them. |
| August 22 |
Toon Moene's copyright assignment has been received by the FSF. Hooray! |
| August 21 |
Bunch of web stuff tonight. The links on the source pages to the individual files appear to have been broken but are now fixed. I have uploaded a Linux x86 binary linked against glibc2, so that non-hackers can beat on g95. |
| August 19 |
Niels Jensen let me know (again) that I/O statements were leaving some unwanted symbols in the symbol table. This happened when g95 tried to match a unit number when there was none. For example: OPEN(FORM="formatted", UNIT=6, FILE='/dev/null') matched a variable reference named 'FORM', then bombed at the equals sign. The expression was freed, but there was still a symbol named FORM in the symbol table. I've fixed this by changing the ordering of the matching so that tags are matched first. These only leave symbols if the '=' sign is seen, so no unwanted symbols are created. This has actually made things cleaner. Added parsing of the IOLENGTH form of the INQUIRE statement. This will end up generating calls to the I/O library which will count the number of characters generated and throw the output away. Fixed the problem with a function interface not creating a function reference in the parent namespace. The problem was that inside a function without a RESULT being specified, the function name itself is a variable. This was being set in the parent namespace when an interface was being compiled. Fixed the error recovery subroutine within the scanner so that it would eat the rest of a continuation line as well as the current line. If an error is now generated within a line, the next line won't generate an "unclassifiable statement" error. Added a new basic type, BT_PROCEDURE. This is necessary when passing procedures as actual arguments and also procedure assignment statements. Added a copy_ref() subroutine that recursively copies lists of g95_ref structures. This was needed to implement copying of expression nodes that represent variables. Also implemented a similar function to copy the constructor structures. The 16K line Runge-Kutta code I've been mentioning for about the last week is now fully parsed by g95, though to be fair, the code consists of copies of the same code in one, two and three dimensions and I've also edited it a bit so that modules are not needed. My Pentium-120 parses all 16K lines in about 13 seconds, giving about 1200 lines/second on an underpowered machine by modern standards. |
| August 18 |
Niels Jensen pointed out (two days ago) that labels associated with DO-loops and the ERR, EOF and EOR associated with various I/O statements generate errors about "labels used but never referenced". I fixed that, but then got interrupted by a storm. Added a fix for the interfaces that involves copying symbols from one namespace to another, but it didn't appear to work. |
| August 16 |
Finished debugging the resolution subroutines for array and structure constructors. Found and fixed a few small problems with compiling the Runga-Kutta code. The big problem has to do with interfaces-- the name of a function or subroutine has to go in the interface's parent namespace, not the namespace of the interface. Also added __DATE__ and __TIME__ macros to the status message. These will come in useful later. |
| August 15 |
Not much time for g95 tonight. I've updated the BUGS file to reflect things that have been taken care of since I last looked at it. It looks like the only major thing left as far as parsing is concerned is reading and writing module information. Given the ease of writing subroutines to read and write lisp-style lists, it shouldn't be that hard. A major thing that does have to happen is how the symbol table is organized-- after a USE, you can have two or more names that reference the same entity... |
| August 14 |
Debugged the structure constructor matchers, started debugging the resolution subroutines for the constructors, both arrays and structures. |
| August 13 |
Added parsing of structure constructors. Not tested at all yet. |
| August 11 |
There have been some emails flying back and forth regarding proposals for how g95 should do floating point arithmetic that haven't filtered up here. The upshot is that a new type has recently been donated to GMP, the mpfr_t. This type is mpf (floating point) and the 'r' is for 'rounded'. For each mpfr_t, the number of bits of precision can be set. If we know what the target machine is, then this determines how many bits of precision each kind has. It also means that we portably emulate arithmetic on the target machine using GMP. Kate Hedström has compiled the latest GMP with g95 and I have repeated her success. I have updated the notes on compiling g95 on the source page. The things left to do here are replacing the mpf_t's with mpfr_t's, worrying about how many bits are associated with each kind and (later) emulating infinities. Katherine Holcomb also sent mail asking some questions about how intrinsic.c works. |
| August 10 |
Finished debugging the parsing of array constructors. |
| August 8 |
Earlier this year, a list of the "The Top 10 Algorithms" was released and caused quite a stir on comp.lang.fortran as well as other places. I was browsing the list in the January issue of "Computing in Science and Engineering" which had an article on each algorithm, one of which was the Fortran I compiler. Some interesting facts about the compiler was that it took 18 man-years to write and was 23,500 assembly statements long and performed many optimizations considered to be sophisticated by today's standards. From the article, it also appears that g95 parses fortran programs much like Fortran I did. The full article can be found here. The bad news is that you have to connect from someplace that subscribes to CSE or you'll be asked for your credit card number. Unfortunately, no time for g95 today. |
| August 6 |
It became obvious that the current plan for storing expression nodes wasn't going to work. In particular it wouldn't have been able to represent an array of structures, so I've come up with something else. It's not that different, but it looks like representing arrays is going to be challenging however it is implemented. The code doesn't compile at the moment because I'm still switching parts of it. I've halfway debugged the parsing of array constructors. I needed to modify the expression matcher in the process-- Something like: (/ 2 /) was being parsed as a "2", followed by an "/", which then complained about the missing denominator. The solution was of course not to complain about a missing denominator and restore the parse pointer to the '/' character. If a denominator is really left out of an expression, then the parser will generate an error when the '/' is reread. |
| August 4 |
Adding the array constructors also requires changes to the expression node in order to store the new data. I've started on that. The same structures will also be used to store structure constructors. Fixed the messed up dates for the last few days... |
| August 3 |
Added parsing of array constructors. It isn't called yet, but it compiles and sort of looks like it works. The code works almost identically to the subroutines that match similar constructions for IO-loops. |
| August 2 |
Ok, back. I've been really busy lately-- not a lot of time tonight, but I did fix a few bugs that showed up in the RK code. I removed the symbol_attribute field from the expression node that I mentioned the other day. It turns out that this really needs to be calculated. In an expression like A(1) the subscript actually causes the dimension attribute to be removed from the overall attribute. |
| July 30 |
My RK code showed a deficiency in the how a variable expression is stored-- something like "VAR%NAME" does not have the attributes of "VAR", but rather of its member "NAME". I fixed it by creating another member in the expression node, but I think I'm just going to change it to a subroutine call, calculated when needed. Tobi sent a bunch of small patches related to the intrinsic functions that he has been working on lately. This area needs a little work too. |
| July 29 |
Lots of stuff today. The biggest changes were in the parsing module. The checking for proper statement ordering is now in a single place and this made several of the program unit parsers much smaller. I implemented the last statement matcher, for the USE statement. After that I started running g95 on a fortran 90 Runge-Kutta integrator that is about 16k lines long. This pointed out a lot of previously undiscovered problems. I can get to about line 500, after working around the fact that the USE statement doesn't actually do anything. I also applied a small patch sent by Tobi, and will probably start on Kate's floatlib patch soon. |
| July 27 |
No one commented on the plan for gmp/floatlib, so it must have been a pretty good idea. Finished debugged the subroutine for parsing a contained subprogram. Added a parser for the MODULE statment, which is not debugged yet. I think this may be the last statement matcher that has to be written. I'll start posting binary releases that anyone can test as soon as we have all statements (but not every fortran 95 constuct) being matched. I think the thing I will do after this is finished is to better document g95's internals so that it will be easier for others to contribute, both from a matching and code generation standpoint. |
| July 26 |
Decided what to do about the current GMP/floatlib dilemma: We'll use floatlib for real numbers (the newer version does more than I thought) and keep GMP for integers. Applied a patch that Tobi sent in a few days ago that I didn't understand real well, and also moved the sort_actual function within intrisic.c to make argument checking easier. Added code for parsing contained subroutines. It is now called, but never tested. |
| July 25 |
I didn't have much time to worry about the GMP/floatlib issue today. I only made a small addition to parse.c, two functions needed to parse contained program units. |
| July 23 |
Finished debugging the parsing of the WHERE statement, added parsing of the FORALL statement. G95 is now 20,000 lines long. Kate Hedström sent in a patch that starts to get rid of the GMP library. I've been thinking about this issue of doing target arithmetic for a couple days now and am still unsure of which way to go. On one hand, GMP is easy and portable. On the other hand, this sort of target arithmetic is closer to how the target actually does its math. I'll post something to the mail list soon. On a positive, if only slightly related note, I've gotten the XFree86 4.0 Direct Rendering to talk to my Voodoo 3 video card. It speeds up the GL 'gears' demo by a factor of three. Even better is that it also appears to work fine with IBM's Open DX visualization package. For scientific computing, there isn't too much that isn't freely available for PCs these days. |
| July 22 |
Tobi Schlüter sent in a patch to add a kind for quad precision reals needed by Bertrand Joël. I've started work on parsing WHERE blocks. |
| July 20 |
Bertrand Joël wrote in with a problem of IMPLICIT statements not being recognized with interfaces. Tobi Schlüter sent in a patch that added IMPLICIT statements to interfaces. After applying it, I added some more code to require that the implicit statements come before all of the other statements. I also copied this code to the BLOCK DATA parsing subroutine that I added yesterday. Kate Hedström wrote in with a few miscellaneous problems, some culled from her previous work on the g77 torture test suite. These included the program consisting of a single END statement, an illegal expression that had a bad error message and a problem with the substring matcher. |
| July 19 |
The problems mentioned yesterday appear to be solved. Worked a bit today on parsing BLOCK DATA program units. This involved adding a new parsing subroutine to parse.c, and some logic to check_conflict to complain about illegal attributes found in these program units. Haven't had a chance to test this stuff yet. Got some email from the FSF today, they've received copyright assignment forms from:
* Joseph Cermak Welcome aboard! |
| July 18 |
I think I've got things back under control with respect to the line wrapping. My thesis project is parsed without any problems and as far as I can tell everything works again. I got a start on fixing the problems with the FORMAT statement, but ran out of time today. |
| July 17 |
Ok, I remembered why I didn't advance lines automatically. The problem is that if a line has something (say, an expression) that ends prematurely, an error is generated that points to the line following the error. I've put a comment at a strategic place to prevent someone else from going down this road again. This sort of seesaw is a bad sign-- It means I haven't thought things out enought before proceeding. Unfortunately, screwed up error loci outweigh masked parser problems, so it is back the other way again. The code currently compiles, but it generates strange errors in strange places. Wrote Bertrand Joël concerning a problem about a failure to match the end of statement after a FORMAT statement. The fix will have to wait. |
| July 16 |
Catching up on emails today. Tobi sent in a problem that had to do with random kind data being present in derived type typespecs. I think I fixed the problem, but I don't have his test case. Tobi also sent a patch that set the expression locus in g95_match_rvalue. He also found a bug that prevented a CALL to a subroutine without any arguments from being correctly recognized. The reason had to do with matching an end-of-statement twice. One of g95's matching conventions is that when you match something, the scanner points just past the matched thing. The reason EOS was matchable twice was that g95's scanner doesn't move to the next line without an explicit call, done between full statement matches. Originally, this was done to make things clearer, but now it is obvious that this leads to situations where it masks parsing bugs in higher level subroutines. I spent the rest of today's work on g95 removing this misfeature. Now, the scanner returns a '\n' when it hits the end of a line (newline are not actually stored) and moves to the next line automatically. As before, end of file returns an infinite stream of newlines. This gave me the opportunity to add a single subroutine responsible for skipping comments. This will be useful if we ever need to implement directives in the form of special comments. The g95_match_eos() subroutine was upgraded to match multiple semicolons if present and several bugs involved in skipping comments were removed. I'm pretty sure that g95_next_char and below are working correctly-- I have only to check that include functionality is still working as well as the things below g95_next_statement. |
| July 13 |
Fixed the problem reported by Bertrand Joël yesterday. This involved a total rewrite of match_attr_spec() for the sake of the CHARACTER type. The problem was that until we see the double colon, we can't be sure we're looking at an attribute specification if the type is a character. The standard allows something like: CHARACTER*(*), save, target, parameter Which defines three variables named save, target and parameter! The analogous code with any other type INTEGER, save, target, parameter is illegal. I had originally assumed that seeing a specification keyword (which starts with a comma) guaranteed that we were seeing an attribute specification, but this turns out to not be the case. What the new match_attr_spec has to do is to read each of the attribute keywords, storing them until we get a double colon. At this point, we can start making sure the attributes make sense. Before this we can only return MATCH_NO. |
| July 12 |
Bertrand Joël sent in a bug regarding a bug in the parser regarding character type declarations. The confusion resulted from the mixing of the old and new style of declarations. No time to fix it tonight. Tobi Schlüter sent a small patch that add locus information to the expression node generated by matching a variable. |
| July 11 |
No update last night, my monitor had some problems with a lot of internal electrical arcing. I went shopping and now have a new, much better monitor. It is amazing how computer prices drop-- I paid about half for this new 17" than I paid for my old 15" several years ago. And one of the best parts?? That "new computer" smell! Anyway, I worked a little bit on complex arithmetic. We can now add, subtract, multiply, divide and compare complex numbers. I am thinking the way to resolve the substring dilemma is to go the function route. It preserves the constantcy of EXPR_CONSTANT and we can still do everything that has to be done. G95 is now 19,000 lines long. |
| July 9 |
Worked a bit on parsing substring references. There is a bunch of code, but it is not called, because I have hit something of a dilemma. The problem has to do with how to store the case of a substring reference that appears after a constant string. Normally, references to subobject, like structure references, array references and substring references are stored in a singly linked list of g95_ref structures that are attached to an expression node that points to the parent symbol. The "obvious" way to store a substring of a string constant is to just add the reference structure (which holds the start and end indexes) to the expression node structure that represents the constant string. The trouble is, the node has a type of EXPR_CONSTANT, which is no longer really a constant, since the range can be composed of other variables. It isn't a variable either, since it cannot be assigned to. In my experience, changing the meaning of flags in subtle ways like this is a bad idea-- it can very easily break assumptions that one has long since forgotten about. The other, equally disgusting ways of doing this are: making the type of the node EXPR_OP, and defining a new intrinsic "substring" operator, or perhaps the best way is to make the expression node an EXPR_FUNCTION node, with a pre-resolved function that "does" substrings. In any case, the best thing to do at the moment is sleep on it and make a decision later. The analogous case with arrays do i=1, 5 is specifically prohibited by the standard for some reason. Fiddled with some intrinsic function resolution issues and realized that I was over my head there too-- I think one of the biggest mistakes that a lot of people make in programming is just jumping in and writing code without a lot thought about where they are going. |
| July 8 |
Potpourri today. Fixed several resolution issues, both in and out of intrinsic.c. Added checking for the square root intrinsic-- basically just comparing against zero. Fixed the checking for statement labels-- this makes sure that a label is being used consistently as well as being defined and referenced. Added the matching for binary, octal and hexadecimal constants into the matching of constants, not just DATA statements. |
| July 6 |
Finished the first half of the intrinsic conversion stuff. An intrinsic conversion is now converted to a function call to a special intrinsic that does the conversion at run time or converts a constant at compile time. The REAL, INT and CMPLX "functions" cause one of these functions to be generated depending on the type of it's argument. The scheme is also meant to allow extension of the basic types into kinds. Tobi Schlüter sent a patch that matches binary, octal and hexadecimal constants. Normally, these are only allowed in DATA statements, but it might be nice to allow them anywhere an integer constant is allowed. The code compiles again. |
| July 5 |
More stuff on intrinsic type conversions, mostly in intrinsic.c. The code still does not compile or work. |
| July 4 |
Did a little more work on the intrinsic type conversion, added limit checking to the actual conversion functions added the other day. Also got rid of some of the placeholder stuff that was there before. The code does not work at the moment. Once completed, the REAL, INT, and CMPLX handlers should provide Katherine with some good examples of complicated intrinsic handlers-- REAL(1) is different than REAL((1,0)). A couple of days ago, Tobi Schlüter sent a patch that held locus information in a variable named where. I've changed the name of the expr_locus member of the expression nodes (and all references) to where because it is such a better name. |
| July 2 |
I did a survey yesterday of what remains to be done. The results are in the BUGS file. Niels Jensen pointed out that substring parsing was not there. Tobi Schlüter sent a bug fix that adds locus information to the new expression parser. This was causing error messages to crash. I've added a couple of subroutine to arith.c that convert between constants of various types. There is no range checking at the moment and they are uncalled, but they will form the basis for simplifying REAL, INT and CMPLX intrinsics. |
| July 1 |
Niels Jensen send a bug that had to do with a complex constant causing a crash. Fixed it. Worked today on how intrinsic functions are resolved and simplified. The ABS function now works in the sense that it is recognized to be intrinsic, selects the correct function by the type of its argument. It even simplifies constant arguments. The plan now is to get the REAL, DBLE and CMPLX functions going and replace the existing _convert placeholder function. |
| June 29 |
No code tonight, but lots of thought, mostly about how the details of how the resolution phase has to work and how simplification works its way into this process. My fundamental realization was that the resolution phase rewrites expression nodes. Depending on its argument list a function can reference a wide variety of things, even within the same function. The start of the resolution phase figures out if the name is generic, specific or neither. This part doesn't depend on the argument list, and so this information can be stored with the symbol. Depending on the status of the symbol, one of three procedures is followed to determine what a function actually refers to. The function reference is represented as an expression node with an expr_type of EXPR_FUNCTION. The 'symbol' member points to the symbol being called. By examining the argument list, we "retarget" the function to call to point to another name-- an external function, an intrinsic function or whatever. By a "name", I mean a real symbol name, one that makes it to the assembler. For an intrinsic function, the name will contain characters that make it uncallable by any other method other than by g95 determining that it is a reference to an intrinsic subroutine. Simplification of a function call can only happen for intrinsics and can happen as soon as a function is determined to actually reference that intrinsic. Simplification also has to be callable separately for initialization expressions, because these functions have to be simplified before the resolution phase. In this case, there is no resolution and function calls must refer to an intrinsic. There is one weird case I have to investigate before actually moving ahead-- the standard points out that name in a module can be renamed by a USE statement, and the reference to this new name still refers to the old intrinsic... |
| June 28 |
Added a patch sent by Niels Jensen a couple of days ago-- this one allows warnings to be deferred until the statement is accepted. This eliminates the problem of issuing a warning in a statement matcher only to have the statement fail to match later. This situation results in a bad warning message. This patch stores the warning message in the same manner as an error message. Started working on some examples of simplification subroutines for Katherine. The first one is just ABS(). G95 is now 18,000 lines long. |
| June 27 |
Yesterdays update was a little late-- Sourceforge had some trouble. The internal motd said something about having problems with IRC people.... Not a lot of time for g95 today. I moved the arithmetic conversion stuff out of expr.c into arith.c. This has two purposes-- it gets rid of calls to the GMP library from outside arith.c which clears the way toward replacing GMP. The second purpose is to have the compiler do more arithmetic on its own, in particular being able to evaluate those intrinsic functions that it has to be able to evaluate. For example, if we want to evaluate the ABS() intrinsic, we have to be able to compare a value with zero and negate it if necessary. The comparision and negation are already there, but we have to have some way of allocating an expression node which has a zero value for the comparison to be possible. |
| June 26 |
Niels Jensen sent in some minor patches that fixed typos and such. I accepted a revision of intrinsic.c that contains a lot of cosmetic cleanups and we are working on a new scheme for delaying g95_warning() messages from being displayed until the line to which the warning applies has been accepted. This prevents a wrong message from showing up if the statement is rejected. The FSF has gotten in touch with me regarding an account that I will be able to use to check the status of copyright assignments. Of course, they use Kerberos, which means I have to install yet another software package... |
| June 25 |
Major rewrite on the expression matcher. This is actually one of the earliest parts of g95 that was written. After the mechanics of scanning was complete, it seemed like a lot of things depended on matching expressions. After all, FORTRAN is FORmula TRANslation. This matcher was a simple infix parser that took a stream of tokens provided by a lexical analyzer and built them into a tree of expression nodes. The fundamental problem was that I was never quite sure that this parser followed all of the rules dictated by the standard. For example, something like: A+-B is not allowed by the rules, but A.EQ.-B is. Anyway, the patches to fix the parser kept piling up and it wasn't clear what problems were still there and how to find them. The new expression matcher works the same way as other matchers within g95-- we try to match something and if that doesn't work, we try to match something else. This also lets us implement the rules for matching expressions as they are in the standard and gives me some confidence that we are doing things right, without hidden surprises. The downside of this method (in general) is that it is slower than the stream of tokens approach. This is because the token approach can back up sooner when a particular syntactic state is determined to be wrong. The matching approach of g95 eats larger "tokens" and consequently will need more backing up. When running it on my thesis project, I get the feeling that it is slower than it was before, but it is still acceptable. It is faster than g77 (which is doing a lot more than just parsing fortran), but as long as the compile times are on the same order of magnitude, we're fine. I've applied patches sent by Tobi and Niels that fix a lot of small things. My mailbox is now down to less than a dozed letters for the first time in a long time. My next priority is to get Katherine going again, no matter how many patches I get peppered with... |
| June 22 |
Added a patch sent by Niels Jensen that added the $-format descriptor to the FORMAT checking code. We also changed the DO matching so that it avoids leaving symbol table modifications laying around, just like the IF statement a couple of days ago. The DO WHILE statement now generates a regular EXEC_DO node and a new EXEC_DO_WHILE. This new node type made more sense than dealing with the overloaded meanings of the g95_iterator structure. Tobi and Niels still have several emails pending... |
| June 21 |
Applied a patch sent by Niels Jensen that cleaned up the source, including adding a display_help() function that looks sort of like g77. Niels also sent a patch to correct a problem matching a tags associated with variables. This revealed a deeper problem in that we really need to be matching a very restricted form of expressions and not symbols. This caused lots of changes in io.c. He also found a problem with matching real literal constants that I introduced last night. It is fixed now, along with the corresponding bug in the subroutine matching complex numbers. |
| June 20 |
Finished adding a patch sent by Tobi Schlüter for matching complex constants. I also made the subroutines for matching a component of a complex constant similar to that of matching a single real constant. In the process, I found a bug that dropped the last digit of these constants. Added Tobi's preliminary patch to match pointer assignments. I haven't had a chance to test it yet. Applied a patch sent by Niels Jensen that correctly matches an implied unit expression in a format specifier. Claus Fischer wrote in to point out that the ENTRY statement wasn't being recognized as an executable statement. This has been fixed. |
| June 19 |
Now that we're adding more and more command-line options, it seemed like it was time to figure out a consistent way to handle how options are set and stored. To this end, I've added a typedef-ed structure called g95_option whose members contain all the options from the command line. The current options are there now. Niels Jensen sent a patch to add a -pedantic option, and I think this is something we definitely want to have, but there are some problems with just calling g95_warning()-- such messages appear immediately when they should not be displayed. More work is necessary here. Fixed a bug I added yesterday when checking for old-style size specifiers. Both Niels Jensen and Tobi Schlüter sent patches for the fix. Claus Fisher and Niels Jensen identified a bug associated with parsing a FORMAT statement. Niels tracked the bug down to g95_gobble_whitespace() and stomped it. Niels also sent a fix for a bug in the matching of a BLOCK DATA statement that I am shocked that I missed as well as fixes to the OPEN and INQUIRE statements. He also sent a patch to allow a dollar sign in FORMAT statements. Tobi Schlüter has two patches pending which are a little larger and more involved-- matching complex constants and pointer assignments. I will get to these soon. My mailbox is now down to a managable number of letters. Hopefully this situation will remain this way for a while... |
| June 18 |
Things are improving. I recently bought a DSL line and trying to get it working made it really obvious how out of date my system was. I was running x86 Linux 2.0.27 based on a version of Slackware that was at least six years old. After many hours of trying to make things work piecemeal, I bought RedHat 6.2 for $16, backed up everything on tape, installed 6.2 and pulled selected bits off of the tape. Right now everything works pretty much as it did before, except that I have a usable system and a fast, live connection to the net. Life is good. I finished applying Steven Johnson's patch with one little exception that had to be made for a user operator named '.e.' or '.d.' and improved error diagnostics on expressions. I've also modified g95_match_interface() not to accept defined operator names with non-alphabetic characters. A long time ago, I can remember being quite puzzled as to why the standard specified this... The work on this problem led to the discovery of a serious problem in the IF statement, which led to the same serious problem in the decode_statement() subroutine. The problem was failing to abide by the convention that all matchers are allowed to mess with the symbol table, with the understanding that on MATCH_NO or MATCH_ERROR any changes would be undone by the caller. This wasn't happening all the time in decode_statement(). This convention has serious consequences for the matching of a simple IF statement, that is, an IF-clause followed by a single executable statement. The single executable statements have their own matchers that we need to call. While we can distinguish between which matcher to call by looking at the next keyword (doesn't affects the symbol table), the problem is that an assignment statement can start with the same keywords. The problem that was happening was that the symbol table had been modified by the IF's control expression and these changes were being undone in the process of successively matching the different action statements. The much more careful algorithm first matches the IF-expression, then tries to see what comes next. If it is an arithmetic IF (Steven Johnson noted this was missing a while back) or an IF-THEN, these are matched straight through, because doing so does not affect the symbol table. We then try to match an assignment statement. If this doesn't work, we undo symbols (getting rid of things build up by the assignment matching, as well as the control expression. We then re-match the IF-expression part (which is guaranteed to succeed, since it worked before), then we peek at the next keyword and call the appropriate statement matcher to see if the rest is correct. More involved, but it works correctly. Applied a patch sent by Steven Johnson that set the length of a CHARACTER declaration to one when nothing else was present Fixed a bug sent by Niels Jensen in which the character-constant matcher generated an error too quickly instead of a MATCH_NO. Fixed a pair of bugs associated with another problem Niels sent. The array-reference matching didn't work right on a whole array reference. The other problem was that the literal constant matching subroutines couldn't match signed quantities. The matchers were written this way on purpose because '-' is also an operator that has to be matched separately. The numeric constant matching subroutines now take a flag that indicates whether to match a sign or not. If someone wants to write the matcher for binary, octal and hexadecimal integers, please go ahead. Fixed a problem with matching literal constants sent by Tobi Schlüter. The matcher has to try and match a character constant before an integer constant because the kind parameter for a character constant is in front and it is a valid integer expression. Applied a patch sent by Niels Jensen that fixed some problems with the kind-number assignments. In g95, the kind numbers for complex numbers are the same of those for real numbers. Another unstated rule is that for numeric kinds, the kind values are sorted by precision, so that if k1>k2, then k1 has more precision. Fixed a problem reported by Niels Jensen having to do with length specifications following actual variables not being recognized. This is fixed now. Claus Fisher sent in three bug reports. One was the failure to fully evaluate an initialization expression (still working on that), the third was the problem of reading a real number instead of an integer followed by an operator which today's fixes, fix, and the last was SUBROUTINE X2 which generates an error complaining about an intrinsic unary operator following a binary operator. This error was intended to flag things like A**-B. It turns out that this restriction is only for numeric operators. At this point, I am tempted towards throwing away the current g95_match_expr() subroutine and replace it with something that is more in line with how the standard explains expression composition-- level one through five expressions each composed in various ways that automatically generate the precedence rules. This is opposed to the existing fairly textbook infix parser that is trying to mimic these rules. Added support for specifying kinds in the nonstandard *<number> format, which it turns out was never a part of any standard. On the plus side, g95 now compiles my thesis project without complaints... There are a couple patches I have not put in just yet. I will try to get to those tomorrow. |
| June 15 |
Worked on applying the patches by Steven Johnson and others concerning the over-greedy tendancy of the real-number matching subroutine to eat its way into an operator in certain cases. I've convinced myself that their fix in OK because I can't think of any legal context in which an alphabetic character can follow a floating-point number. Deep in the standard, user defined operators are prohibited from containing digits. I am quite tired at tonight, so checking it will have to wait. Hopefully things will settle down soon. |
| June 12 |
Started dealing with the backlog associated with the current volume of mail on the list. Steven Johnson sent in a patch to add a DOUBLE COMPLEX support in type-matching statements. As I mentioned in email, Bill Clodius pointed out this deficiency long ago and I put off implementing it, mainly because DOUBLE COMPLEX is not part of the Fortran 95 standard. Nevertheless, I have included it because it is something that people seem to expect. Tobi Schlüter sent in a patch to match constant complex numbers. I took part of it and asked for part of it to be rewritten. Support for complex numbers amounts adding more GMP-dependancies at a time when I wanted to start moving away from that, but it seems to me that it enhances g95's ability to parse fortran 77 programs for the moment, which enhances people's ability to test g95. I've also added Steven Johnson to the CONTRIB file. There are a couple of definite bugs left in my inbox, but I am out of time for tonight. |
| June 11 |
Finished adding parsing for the DATA statement. It looks like it works. Tobi pointed out that the INCLUDE directive replaces itself with the contents of the file and *does not* change the form of the program being parsed. That's fixed now. The mail list had a lot of traffic today, about a dozen letters. There were several patches, including adding a DOUBLE COMPLEX keyword, which many people expect even though it isn't part of the standard. There was a patch to add parsing of complex numbers. The worst news was how g95 fails to parse 2.gt.1.and.flag correctly. The problem is that g95 relies fundamentally on "greedy" matching-- the "1." is read as a floating point 1.0 instead of integer one. Not sure how I am going to deal with this at the moment. I will look at the accumulated messages in more detail tommorrow night or perhaps Tuesday. G95 is now 17,000 lines long. |
| June 10 |
Lots of diverse things today: Fixed bugs in the data transfer list matcher so that an explicit UNIT= tag is parsed correctly. Rewrote the next_fixed function so that it uses the g95_next_char functions instead of examining the line buffer directly. This was because the INCLUDE directive plays fast and loose with the line buffer. The previous version was also not quite correct with respect to comments. Realized that there was a problem in including files formatted in a different form than the current form. The upshot was that the first line of such files would be read in the wrong form. The solution was to move the include logic up a level into the next-statement function. Added parsing of statement functions. Started working on parsing a DATA statement. This is the last of the fortran 77 statements that I know about. In fact, the last couple of bugs have been found by running g95 on programs that are a couple hundred lines long. I've added a new -r option that runs the resolution phase-- this prevents problems with not being able to fully resolve names yet. |
| June 9 |
Added a command line option which is meant to be for debugging purposes only. The -v option is 'verbose', and controls the printing of the namespace and code structures. They were printed by default, but now a -v must be given. This is to prevent lots of text from being printed on program units that are just fine. Tried tracking down the problem Tobi found the other day, but didn't get very far. I think some cleanup has to be done in the area of kind parameters-- this code was written very early in g95's history and there appear to be redundant subroutine floating around... |
| June 8 |
I mailed some notes on intrinsic interface checking to Katherine today, a copy should be in the mailing list archives by now. Not a lot of chance for work on g95 today. |
| June 6 |
Niels Jensen sent in another bug associated with matching array subscripts which is now fixed. Wrote a function g95_match_variable(), which matches a variable that can be assigned. Something like this is required for a DATA statement that looks like DATA A / 6 / If this matcher used g95_match_expr, the expression "A/6" would be matched, which isn't correct at all. There are a bunch of other places that will be cleaner with a variable-matching subroutine. Fixed some serious typos in last night's update; I was pretty wiped... |
| June 5 |
Katherine Holcomb, who works with the Legion project has arranged an account for me-- this means easy access to a couple of commercial f90 compilers for when the standard is vague and an alpha platform for testing. Unfortunately, I didn't have any time for g95 tonight. |
| June 4 |
Niels Jensen wrote in with a few bug reports and I've fixed all the one he's found and a few others. I've actually started applying g95 to larger blocks of fortran 77 code, with some encouraging results-- some problems still exist, but there are a lot of statements that are read correctly. |
| June 3 |
Implemented parsing of the FORMAT statement. One of my pet peeves about fortran compilers over the years are the compilers that defer the error checking to the library. It really sucks to specify a bad format that causes your program to crash after several hours, just before it prints a result you wanted. Although I am more experienced now and never make format errors (well, hardly ever), I still think a good compiler should check constant format strings. It took quite a chunk of code to do this, and now there is a new source file, format.c. Also started work on logic to keep track of labels with a program unit, making sure there are no duplicate labels defined and that labels are used in the correct manner. |
| June 1 |
Katherine Holcomb sent back a copy of intrinsic.c with all of the elemental functions marked. Other than that, not a lot of time for g95. As soon as g95 can parse all of the statements correctly, it will be in the 'larva' state. When we generate code, the 'pupa' state. When g95 is done, we'll see if it will be a beautiful butterfly, or just a big bug.... |
| May 30 |
More statement parsing. Added the EQUIVALENCE, ENTRY, MODULE PROCEDURE and SAVE statements. This might sounds like a lot, but these last couple of statements generally just sort of save the data without really doing much with it. On the other hand, we're almost to the point where anyone in the world can run actual fortran programs through g95 and help debug the parser. When we get to that point, I'll start posting binaries on the website so that people won't have to compile g95 themselves. The DATA and FORMAT statements are all that remain of the legacy Fortran 77 statements. The FORALL, USE and WHERE statements are the remaining Fortran 90/95 statements. G95 is now 16,000 lines long. |
| May 29 |
Implemented the parsing of the NAMELIST and MODULE statements. |
| May 28 |
Implemented the parsing of the COMMON statement. It appears to work... I am taking a break from name-resolution questions for a while. |
| May 27 |
Spent too much time thinking about a conundrum involving name resolution. I'll post the question to comp.lang.fortran tomorrow. After that, I thought about what's necessary for intrinsic function resolution. After scaling back the original plan to something simpler (identifying elemental intrinsics), I've passed that on to Katherine. Other than that, I implemented the BLOCK DATA statement, but not the BLOCK DATA parser. |
| May 26 |
Small changes to add_sym() to copy the simplify function pointer into the symbol table being built. I am currently thinking a lot about how symbol resolution has to work. Even though this is a long weekend for the US (Memorial Day), my mother is coming to visit and I will probably not have a lot of time for g95. |
| May 25 |
Did some thinking about how the resolution process will affect symbols, and noticed that the PARAMETER> attribute was a bitfield in the attribute list when it should have been one of the flavors, because a PARAMETER is mutually exclusive with a lot of the other flavors. Debugged those changes and the changes to intrinsic.c the other day regarding the new simplification member. |
| May 23 |
Updated intrinsic.c a bit, adding a new field to the structure that holds a description of each intrinsic procedure. The field is a pointer to a subroutine that will be responsible for simplifying an expression node that holds a call to that procedure. Some of these are required for a compiler and some are optional. This meant a bit of typing... |
| May 22 |
I got a good answer from Steve Johnson about my question on va_arg, which read in part: [...] The va_copy_ macro was one of Martin Otte's solutions, but I didn't know how portable it was at the time. Toon Moene wrote in to point out some compilation warnings in the current version of error_print() on his alpha. The easy fix was to replace the single array of saved argument pointers by several small arrays of real arguments-- integers, characters and character pointers. This implementation limits how many arguments can be called in g95_error(), but since this is only within the compiler itself, it doesn't seem like too serious a limitation. Martin Otte reports that the current version compiles fine on the PPC. Spent some time debugging intrinsic.c, fixing lots of fairly simple bugs. The function resolution subroutine now calls the resolution for intrinsic functions. So calling a function at the moment assumes that it is an intrinsic functions being called. It shouldn't be too long until Katherine can resume working on functions to verify the interfaces for intrinsic functions that don't fit the simple model in place now. I think another possible project will be for someone to implement functions that emulate intrinsic functions within the comiler. |
| May 21 |
Got caught up in some excitement elsewhere last night and forgot to update the webpage. All the modifications for the resolution phase in existing code appear to be complete. The rewrite of error_print() is also complete. On to new stuff! |
| May 19 |
Martin Otte wrote back that my fix didn't work and sent a couple of suggestions. After only one compiled here, I realized that doing anything with va_list other than messing with the va_* functions/macros is a bad idea. So I need to rewrite error_print(). Other than that, I finished resolving the SELECT statement and added resolution for the DO statement and made a start on array specifications. |
| May 18 |
Martin Otte wrote in to say that he had compiled g95 under PPC based Macintosh running Linux. The only problem he had was a little illegal manipulation of a va_list type on my part in the error_print subroutine at: static void error_print(char *type, char *format0, va_list argp0) {
I've sent him a hack that replaces the assignment statement with a memcpy. I'm not sure how legal or portable this will be... any ideas would be appreciated. I thought I was pretty good with C, but I need a wizard's opinion. If portability is really bad here, we can probably code around the need for the copy/assignment. After that, I started working on moving type-checking code from the parsing phase to the new resolution phase. The assignment, IF and alternate RETURN statements have been converted. The SELECT statement is taking a lot more effort. Sourceforge permissions were screwed up again yesterday, but appear to be fixed. Again. |
| May 17 |
After much thought, I've decided to resume work on the current g95 rather than try retargeting SGI's compiler for the gcc back end. A couple of other reasons for the decision are:
One of the most ironic things about programming is that scientists, who produce some of the worst code ever written, generally spend most of their time adapting someone else's code for their own purposes. As a scientist myself, I've been there and don't like it much either. Despite the large size of the SGI front end (380k lines) and the embryonic (now larval?) g95 compiler, I think that g95 will end up being much smaller than the SGI compiler because of g95's totally different approach towards reading a fortran 95 source file. The SGI authors opted for the traditional token-based compiler which complicates the lexical analyzer a great deal. There were some licensing issues regarding g95, SGI-IA-64 and gcc which I thought in the end as relatively unimportant. Anyone interested can follow the banter on g95-develop and gcc mailing lists. My view is that the licensing issues determine g95/SGI-64's "final resting place", as it were and little else. What I want is a fortran 95 compiler that will work on most of the computers that I will be using over the next couple decades. I am therefore heading in that general direction. |
| May 15 |
Big News-- SGI has released the source to their fortran 90 compiler. The license is GPL-2, which doesn't pass the FSF's smell test. I am not sure how this will affect g95's development. Continue on with the current g95? Jump ship and retarget SGI's compiler to produce RTL instead? (license problems here). Convince SGI to change the license to GPL-1? We live in interesting times... |
| May 14 |
Worked more on making the resolution phase of compilation a reality. Mostly it involved moving stuff around, from the parsing phase to different subroutines happening later. I started debugging and the first bug turned out to be that the new resolution phase is never even called, and that generated a big question about when to call it. It turns out that it has to be called at the very end of the program unit parsing, after any internal subroutine have been parsed themselves. The main subprogram has to be resolved first, followed by the internal subprograms. This ordering suggests that we traverse the namespace structures, which are linked exactly in this manner. Not much time for further work today. The permissions on sourceforge are still incorrect, so files you see will be from the 11th. |
| May 13 |
The permissions on the sourceforge ftp server are screwed up again. What you see there is the May 11 upload. Fixed the subroutines I wrote last friday to handle zero-length argument lists, and also cleaned them up a bit. Added a subroutine to do type and kind comparisons between argument lists and added subroutines that call these subroutines to check intrinsic functions and subroutines. At this point, we should be able to check those subroutines that don't require any special checking. I got around to thinking about testing this new stuff, and the easiest way seems to be to just hook the intrinsic name-resolution into the big name resolution that has to happen at the end of compiling a program unit. If you've never read Chapter 14 of the standard, it's got the most dense prose of the anywhere in the standard. As usual, I'll implement it simple-but-wrong first and work on improving it later. I've been thinking more about projects that other people could work on, and one other thing that we could use is implementing functions that do the work of intrinsic functions within the compiler-- ie if certain intrinsics operate on constants, the we should precompute the answer rather than letting it be done at runtime. More on this later. |
| May 11 |
The sourceforge guys have fixed the permissions problems on the ftp server-- for the last four days, the world has been seeing the files from the 7th, since I could not overwrite them with new files. The current files are now in place. Worked on subroutines for matching intrinsic argument lists. The idea is that given formal and actual lists, we sort the actual list so that each element in the actual list corresponds to an argument in the formal list, even if we have to allocate a blank actual argument node for an optional and missing argument. Once the actual argument list has been suitably massaged, comparing formal vs actual arguments can be done by traversing both lists simultaneously. |
| May 10 |
Niels Jensen's patch to get rid of match_ |


