BBC BASIC to C Converter
Martin Carrudus
!BBC_C translates from Acorn BBC BASIC held in its associated file into Acorn ANSI C in a text file suitable for an Acorn C compiler. This article describes how to use the application, what !BBC_C does and the various options that are presented to the user. Before trying to use !BBC_C you should read the installation notes provided with the software, or it may not work correctly. Please note that !BBC_C is Shareware, registration details can be found in the Registration file inside the archive.
In order to understand the advantage of translation you will need to know something about the difference between compilers and interpreters.
Compiled vs Interpreted
All computer languages need to be translated into the fundamental instructions for the particular computer (machine code). A compiler is presented with the complete program and translates the whole thing into machine code. The machine code is then run on the computer in a separate stage. An interpreter, however, translates the program and executes it as it goes along. The advantage is that you do not need a preceding stage before you run the program. The disadvantage is that, since the whole program has not been 'seen', the final code is usually less efficient and runs more slowly than compiled code. The compiled C that !BBC_C produces seems to run about three times faster than the same interpreted BBC BASIC.
In order for an interpreter to obey the program efficiently, it is usually coded so that it makes it easier to work out parts of the program. The main 'trick' is to code the keywords (words like GOTO, or IF or PRINT) into only a couple of characters (bytes), known as a token. This means that they can be easily recognised every time the interpreter runs across them in the program. There are over 160 different 'tokens' used in BBC BASIC files.
At the beginning of each line of the program, the interpreter will need to know the line number of that line and where the next line is. Both are coded for greater efficiency. Most types of BASIC use an interpreter, with a couple of exceptions such as STOS on the 16bit Atari machines.
Due to the nature of BASIC, it can do things 'on the fly' that a compiled language cannot. Also BBC BASIC was designed without any formal structure (the grammar of the language) in mind, whereas the C language has a formal grammar. In particular, the C language and its compiler are very particular about the mixture of different data types (i.e. storage locations (variables) designated as integer, floating point or character string) whereas BBC BASIC uses these locations interchangeably. For these reasons, you should know something about the C language before using !BBC_C in order to understand any compiler error messages.
Using !BBC_C.
Drag the Application to the Directory you require to use it in.
Ensure you have the latest versions of the Modules required by using !SysMerge with the version of !System supplied. !BBC_C issues a warning message if your modules are out of date. On all modern versions of RISC OS double click on !Boot, this opens the configuration window. Clicking on 'System' then offers you the option of merging the copy of !System supplied with !BBC_C with the !System folder on your computer.
Double click with the mouse on the !BBC_C icon to install it on the icon bar. Click on the !BBC_C iconbar icon to bring up the application's dialogue box. Click the 'menu' button on your mouse on the bar icon for other functions including Info and Help. The Acorn !Help facility gives descriptions of the options available from the main application dialogue box.
You can load a BBC BASIC Program into !BBC_C by typing in the file name, dragging the file icon into the application window, double clicking on the file icon or dragging the file icon onto the bar icon. Choose the options required from the dialogue box presented and click on the 'Run' button. The effect of these options is more fully described below.
The C text file is eventually offered for saving prefixed by a capital 'C'. After checking it, principally for variably dimensioned arrays and undeclared variables (commented out), place it in a directory called 'c' in another directory along with 'h' which contains the files 'leaf' and 'data', and a directory 'o', all of which are needed by the compiler.
Present the C file to the compiler. There may be many messages concerned with incorrect casts as BBC BASIC is not too particular about data types, whereas the C compiler is. The author has attempted to minimise such messages.
A Beta version of 'LeafLib' needs to be included in the directory '$.o'. You also need to include its file name under 'Libraries' off the menu associated with the compiler dialogue box, separated by a comma from the other standard libraries. It is needed during the 'Link' phase of the compilation in order to supply certain routines not given in the Acorn BBC specific library.
Though every attempt has been made to encompass all forms of construct available in BBC BASIC, !BBC_C still sometimes doesn't recover from syntax errors and produces a blank output file.
Additional Notes for Acorn C Compiler Version 5.
For the Acorn C Compiler Version 5 onwards you will also need the non-standard header files 'bbc.h' and 'os.h' in your project 'h' directory and to reference the library 'RISC_OSlib', (Needed during the Link phase) under 'Libraries' option off the main dialogue box. This is in addition to the 'LeafLib' library.
The !BBC_C translator may object to the operator '=' if it is a test for equality. If you mean it to be this enclose it in brackets (e.g. A=(B=C) instead of A=B=C ). Similarly '=' used in FNs to return values must be separated from any preceding expression. Logical expressions after IF, WHILE and UNTIL are taken to be C logical expressions (with '&&', '||', '^^' and '!' for AND, OR, EOR and NOT respectively. If any logical sub-expression is bracketed then it is taken in a bitwise manner with '&','|','^' and '~'. In all other contexts logical expressions are taken to be bitwise.
FNs returning character strings are sometimes taken as returning a numerical value and the Translator grammar objects. You can force such FNs to be character string type by giving them a name postfixed with '$'(e.g. strcat$ ) to indicate that the return value is a character string. Generally the Translator determines the return value of a FN by the context in which it finds the FN.
For the reasons above, a logic expression starting with an FN will be taken to be numeric to start with. If the FN is in fact returning a character string and this is being compared with a string literal e.g FNmess>"Hello" then reversing the expression i.e. "Hello"<FNmess will make !BBC_C take the FN as returning a character string. !BBC_C then does not produce an error message. Also !BBC_C 'remembers' the return value of a FN from above.
Another advantage of translation BBC BASIC to C is that, in theory,C is a portable language and compilers for it exist on other types of computer. In simple cases it is possible to get the translated BBC BASIC program to compile on another computer. However, BBC BASIC in more complicated programs makes use of facilities that only exist on Acorn computers and which are handled differently on other machines. !BBC_C is only designed to produce code intended for use on RISC OS, not on other operating systems.
Two examples are the use of raw Acorn assembler (which is translated directly into the machine code of Acorn computers) and SWIs (SoftWare Interrupts). SWIs are pre-written routines that enable the handling of various Acorn constructs and do not exist on other machines. There are hundreds of SWIs that are available.
Lastly BBC BASIC programs sometimes make calls to the operating system of the Acorn computer and the commands are different on other computers.
Using the !BBC_C Set Up dialogue box.
Input Slot.
The name of the BBC BASIC file to be translated can be typed in here. The name will be inserted if the file concerned is dragged onto the dialogue box or dragged to the !BBC_C icon on the icon bar. The file is 'grabbed' if it is double clicked upon. The !BBC_C program will detect non BBC BASIC files when run.
Run Button.
Runs the !BBC_C BASIC to C translation when clicked upon. Various messages are displayed on the screen to indicate the progress of the translation followed by a message giving the number of identifiers used in the BBC BASIC program. The number of constructs not recognised by the grammar (syntax errors) is also given. Syntax errors result in comments in the generated C and error message dialogue boxes on the screen. These dialogue boxes are usually suppressed. (See below).
The generated C code is offered for saving within the same directory as the source BBC BASIC with the same name prefixed by a capital 'C'. Click on 'OK' or press the Return key if you are satisfied with this, otherwise alter the file name. If you hold down 'Control' and press 'U', the slot is cleared and a new name can be typed in and the icon dragged to the directory of your choice.
Cancel Button.
Clicking here removes the application dialogue box from the screen and returns you to the previous set of options.
Description Button.
Clicking here and selecting 'Run' makes the !BBC_C application present the user with two dialogue boxes giving a description of itself.
Help Button.
Clicking here and selecting 'Run' makes the !BBC_C application present the user with two dialogue boxes giving general help information. 'Description' and 'Help' cannot both be ticked at the same time, because selecting one deselects the other.
Verbose Button.
Clicking here makes !BBC_C output debugging information concerning the process it is going through as it scans the BBC BASIC program i.e. the identifiers, literals and tokens it is encountering. This process considerably slows up the processing, so it is advisable not to turn this feature on unless you really want to see what !BBC_C is doing. At the end of the process, by clicking 'menu' over the window concerned, it is possible to save the diagnostics to a separate text file. Note that by the use of 'menu' it is possible to pause, resume or abort the process. Also note that by clicking 'menu' over the main dialogue box one is able to instigate the options 'Help', 'Debug' or 'Description' from a special menu. The opportunity to alter the command line that controls !BBC_C is also available from this menu.
Indexes Button.
A facility to make FOR loop indexes and the indexes used in array arguments to be taken as integer variables. This facility was introduced in order to overcome the necessity for array arguments to have an integer type within the C language. However the C compiler does not need 'for' loops to use integers and if this option is not chosen, array indexes are still given a cast of 'int' if required.
Choosing this option could cause calculations with the relevant variables in other places to be truncated, so it should be used with caution.
Single Precision Button.
When enabled, this option causes all floating point variables (those holding fractional values) to be made C type 'float'. Otherwise such variables are given the C type 'double', which offers more precision in floating point calculations. For various reasons, this option is best not enabled.
Remove REMs Button.
When enabled, this option prevents BBC BASIC REMs (comments) from being translated into C comments within the translated code. Since one may prevent C compilers from keeping comments in the final compiled code, this option is only included for completeness.
All Lower Case Button.</P>
Problems can be encountered when the translator turns all identifiers to lower case in line with C programming conventions. BBC BASIC identifiers such as 'a%' and 'A%' are quite frequently used and could be confused with each other if they were both converted to lower case. When this option is not enabled it prevents the first character of the identifier from being altered. With most programs you are advised not to convert entirely to lower case.
Integer, String and Float Terminators.
In BBC BASIC, integer variables are terminated by '%' and character string variables by '$', otherwise they are taken as floating point. In the C language all variables must be declared before you use them, so they can have any name you like. These writable slots offer the user an alternative value to % and $, since $ and % cannot appear in C variable names. There should not be any need to alter the default values automatically supplied. If you should alter these values, only alphabetic upper or lower case characters or the underscore character may be entered into these slots.
Note this principal difference between BBC BASIC and the C language, as mentioned above. In BBC BASIC, variable names become available immediately they are mentioned anywhere in the code whereas the C language requires that all variables must be declared before they are used.
FN/PROC Terminators.
BBC BASIC FNs (Functions) and PROCs (Procedures) can have the same name without any conflict arising, but in the C language every identifier must be unique. The two letter terminators are postfixed before any terminator due to data type, so a FN called 'mess%' with the defaults would become 'messfn_i'. .
The BBC BASIC interpreter is also able to cope with such names beginning with numerics and containing keywords in the language. The C compiler requires that identifiers begin with an alphabetical character, so numerics are converted to 'a' - 'i' and 'z' for zero e.g. '1st' becomes 'ast', '0th' becomes 'zth' and a procedure called PROCTO becomes 'procto'.
There should not be any need to alter the default values. As with variable name terminators only up to two upper or lower case characters or the underscore character may be entered in these slots.
Array Terminators.
As with BBC BASIC FNs and PROCs, there is no confusion if an array name is the same as a variable, FN or PROC, but the C compiler requires all identifiers to be unique. A three letter terminator for array variables names in the translated C is made available by this option. This terminator is in addition to any data type terminator, so an array name 'data%' in BBC BASIC would become 'dataarr_i' in the translated C with the defaults supplied.
There should not be any need to alter the default value supplied. As above only upper and lower case characters or the underscore character may be entered in this slot.
DEF Argument Terminators.
It was found that the names of the formal parameters of FNs and PROCs in DEF statements were being confused with the same variable names used elsewhere. These names are purely dummy names, so this facility enables one to give these parameters a terminator that distinguishes them from other variable names. As before this terminator is in addition to the data type terminator and the default value of 'arg' need not be altered.
Should you wish to disable this facility, clear the icon.
Primary and Secondary Indentation.
The single numerics entered in these slots determine the indentation to be applied to the translated C. The primary indentation applies to the whole code and indents each code line within the main program and C functions by the supplied number of spaces, making the code more readable by supplying an initial margin to the code.
The secondary indentation is added to the primary for each programming structure encountered (IF, REPEAT, WHILE, FOR and CASE). It is reduced by the same amount when the structure ends (with ENDIF, UNTIL, ENDWHILE, NEXT or ENDCASE respectively).
This also makes the translated C code more readable. The indentation in no way affects the speed or size of the compiled code, as the compiler ignores any embedded spaces. However the supplied BBC BASIC may have been 'squeezed' to remove spaces, as these do affect the speed of the interpreted BASIC.
Errors:- Quiet Button.
When enabled, this option suppresses the error dialogue boxes generated when a syntax (or parse) error is detected in the BBC BASIC program. As mentioned above, this may not be a fault in the BBC BASIC program, but a deficiency in the grammar that is being used by !BBC_C to parse the program.
Some programs produce many error messages, so this option is enabled by default to avoid constantly having to respond to the error dialogue boxes.
If !BBC_C terminates with a 'Untrapped Parser Error' message, then it has fallen over at some point and the output file will be empty. In any case, the total number of errors is given. Use the tips above to reduce the errors to a minimum.
In some cases, badly bracketed programming structures result in !BBC_C searching for a terminating statement (containing ENDIF, ENDWHILE, UNTIL, NEXT or ENDCASE), which is never found and the program falls over.
!BBC_C does attempt keep track of structures and close higher structures when lower ones close e.g. a WHILE loop within a CASE structure will be closed with an injected ENDWHILE upon encountering a new WHEN or OTHERWISE.
Errors:- Maximum.
!BBC_C will abort if the number of errors exceeds the indicated number. By default this is 300. If you do not wish the program to abort, you should increase this number, but by experience, if a large number of errors are generated then !BBC_C usually falls over and produces a blank output file. Sometimes this is due to not selecting the correct options for your program from the options described above.
Order of Identifier Declarations.
!BBC_C keeps an internal table of all the identifiers (symbol table), which can be kept in different sorted orders. The writable icon can only contain 0, 1 or 2. .
With 0 in it, the identifiers are just added to the end of the list every time a new identifier is encountered, so they are declared at the head of the generated C code in 'As Found' order. With 1 in the icon, the identifiers are kept in alphabetical order and with 2 in the icon, the identifiers are further sorted on their 'type', that is whether they are variables, functions or procedures. The option '2' is selected by default.
It is recommended that you should at least select option '1', or better still '2', because it makes it easier to search for a particular identifier in the declarations at the head of the generated C code. Also with 'As Found' order, the search for new identifiers in the symbol table is less efficient and can slow down !BBC_C.
Keep Separate List of Local Variables.
When this option is enabled, the variables local to a BBC BASIC function or procedure are kept in a separate symbol table, which is renewed for each new function or procedure. This means that variable data types are only determined from their behaviour within the function or procedure and not the behaviour of any global variable with the same name.
Variables local to the function or procedure are their arguments and any variables declared as LOCAL in BBC BASIC within the routine.
This option can often reduce the number of syntax errors and furthermore BBC BASIC functions are often given much more accurate data types for the value that they return. For these reasons, this option is enabled by default. .
Named Wimp Control Blocks.
By default this option is ticked 'Off' and the writable icon is disabled. This option, when enabled, allows you to specify the BBC BASIC names for up to ten variables used as Wimp control blocks. These blocks are used in SWI (SYS) calls, particularly in Wimp applications.
Such names may consist of alphanumeric characters, in both upper and lower case, and also the symbols '%', '$' and underscore. No other characters will be accepted in the writable icon. Each variable name must be separated from the next by a comma, and the whole string of names cannot exceed 100 characters. Such variables are given the data type of a string variable in the generated C code (char *). .
Further Technical Information.
For further information concerning the processing that !BBC_C does with BBC BASIC, consult 'TechNote' which is present within the application. In particular note that variably dimensioned arrays need suitable fixed values for their dimensions to be filled in manually within the generated C code (in #define directives at the head of the code). If this is not done, then C compiler errors will occur.
Certain BBC BASIC constructs were found to be untranslatable into an equivalent construct in the C language. !BBC_C will warn you in the generated C code where this occurs and the supplied 'TechNote' gives a list of those things that !BBC_C is unable to handle. Most of these constructs are only rarely used.
Writable Icons.
Pointing the mouse at a writable icon and clicking 'Select' (left hand key) will insert a red cursor in the icon concerned. The backspace or delete key will delete characters to the left. Type into icons from the keyboard. Those icons with a white background will only accept alphabetic characters from the keyboard, whilst those in light yellow will only accept numeric characters.
If you clear the variable terminator icons to spaces, !BBC_C returns to its default e.g. for the integer variable terminator, it will take the value '_i' if the icon is cleared out. However the other terminator icons can be cleared to a null string. .
Having entered a writable icon, pressing the 'Return' (Enter) key moves the cursor in turn to the next writable icon. Pressing the 'Return' (Enter) key when the final icon has been reached has the effect of running the application. .
StartUp Banner dialogue box.
On double clicking with the left hand mouse button over the !BBC_C Application icon, a banner dialogue box appears giving proprietary information about !BBC_C. At the same time the !BBC_C icon appears on the icon bar. The dialogue box appears for about five seconds, but the dialogue box can be immediately removed from the screen by clicking once over it once with the mouse. The dialogue box only appears once during any one session on the computer. That is, if you quit the !BBC_C application, carry on working without switching off the computer, then start !BBC_C again, then the dialogue box does not reappear.
Description of the Icon Bar Menu.
When the !BBC_C application is run by double-clicking on its icon, the icon is loaded onto the icon bar. Clicking 'Select' on this bar icon gives the 'Set Up' dialogue box described above, but clicking 'Menu' will present the user with the following menu:
Clicking on 'Quit' immediately removes the icon from the icon bar and aborts the application.
Clicking on 'Help' causes a window to appear giving descriptions of the options available from the 'Set Up' dialogue box.
Clicking on 'Save options' will cause the set of options that you have chosen from the 'Set Up' dialogue box, and from this menu, to be permanently saved to a file within !BBC_C. This means that your version of this application will always come up with your set of options rather than default ones every time you run !BBC_C.
Moving right over the arrow besides 'Info' displays an information box:
Principally this will tell you what the version and date of the Translator that you have is .
Moving right over the arrow besides 'Options' will produce a display:
The first set of options you are given is firstly to cause !BBC_C to automatically run when it receives the file name without having to click the 'Run' button - Auto Run. Secondly you can cause the translated C to be automatically saved without having to click 'OK' on the 'Save As' box mentioned above - Auto Save. Both options are enabled by clicking on the menu item which causes it to be ticked on the menu display.
Moving right over the arrow besides 'Display' will produce:
By default when !BBC_C runs it produces textual information. By ticking 'Summary', instead a Summary window is displayed when !BBC_C runs. This looks like this:
The number of lines given is not the number of lines of C code output, but the number of lines of text that would be displayed if the 'Text' option had been chosen. Not both 'Summary' and 'Text' can be chosen at the same time.
All the menu options chosen will be saved when 'Save options' is clicked on.
Tips for avoiding !BBC_C Syntax Errors.
Determining return values of FNs.
As mentioned above, reversing inequalities with FNs in them often avoids syntax errors. Also !BBC_C 'remembers' the return value of a FN that has been previously defined above the point in the code where the FN is being called. FNs returning numerics are taken differently to those returning character strings. .
Bitwise ANDs, ORs, NOTs and EORs.
As mentioned above, expressions after IF, WHILE and UNTIL are taken to be C logical expressions. In all other cases (e.g. in assignments), the expressions are taken to be calculated in a bitwise manner using '&', '|', '~' and '^'. For these purposes, the BBC BASIC construct TRUE is translated to a #define macro _TRUE, which has the value -1. Similarly FALSE becomes _FALSE with the value 0. Note that ANSI C returns the value of +1 in logical expressions that are true, but BBC BASIC returns a value of -1.
Also, where the code is expecting a logical expression, !BBC_C can distinguish between those expressions that require to be taken in a logical or bitwise manner. Mixed expressions will cause !BBC_C to emit syntax error messages.
Badly bracketed programming structures.
!BBC_C attempts to close higher programming structures when lower ones close, but sometimes can't cope. In particular not closing DEF PROC with a final ENDPROC can cause problems. If your program causes !BBC_C to produce a blank output file, then it could be due to bad structure bracketing.
Returning values from FNs.
Returning a value from a FN using '=' via an IF statement can cause problems when the THEN is omitted. The only solution is to insert the THEN into the BBC BASIC code. .
Similarly a test for equality in an IF statement (e.g. IF a%=b% c%+=1) should be bracketed (i.e. IF (a%=b%) c%+=1) because of the same conflict as to whether the '=' is being used as a test for equality or for returning a value from a function.
EVAL, Variable GOTOs and Variable GOSUBs.
!BBC_C is unable to cope with these constructs. EVAL produces an error message in the code. Variable GOTOs and GOSUBs result in meaningless C code. 'Structured Programming' techniques demonstrate that you should be able rewrite your program to avoid these sorts of construct.
Martin Carrudus
|