Assignment3 is due at 9pm Tuesday, February19 . It is worth 40 points. Procedures This assignment is to be done individually. Turn in answers to the exercises below on theUA Blackboard Learnsite,...






Assignment3 is due at9pm Tuesday, February19. It is worth 40 points.


Procedures


This assignment is to be done individually.


Turn in answers to the exercises below on theUA Blackboard Learnsite, under Assignment3 for this class.



  • Your answers should consist of the answer to ExerciseA (submitted as an attached text/PDF file, Text Submission, or Comments), and the source code for Exercise B (filelexit.lua).

  • Your homework submission may not be examined immediately. If you have questions,e-mail me.


Exercises (40 pts total)


Exercise A — Running a Forth Program


Purpose


In this exercise you will make sure you can execute Forth code.


Instructions


Get the filecheck_forth.fsfrom the class Git repository. This is a Forth source file for a complete program. When it is executed, this program prints “Secret message #3:”. Below that, it prints a secret message. Run the program. What is the secret message?


Note. As always, there are several ways to run this program. One of them is to start up the GNU version of Forth (Gforth), and type the following.



[Interactive Forth]


include check_forth.fs


Exercise B — Lexer in Lua


Purpose


In this exercise you will write a Lua module that does lexical analysis.


In the next assignment, you will build a parser on top of your lexer. And in a later assignment, you will build an interpreter that uses the output from your parser.


Instructions


Write a Lua modulelexit, contained in the filelexit.lua. Your module should do lexical analysis using the state-machine approach.


Be sure to follow theCoding Standards.



  • The interface of modulelexitis very similar to that of modulelexer, which was written in class—with some differences, to be covered shortly. In particular,lexitexports:

    • Functionlex(i.e.,lexit.lex), which takes a string parameter and allows for-in iteration through lexemes in the passed string.

    • Numerical constants representing lexeme categories. These constants are shown in a table below.

    • Tablecatnames, which maps lexeme category numbers to printable strings.



  • The interface of modulelexitdiffers from that oflexeras follows:

    • Lexing is done based on theLexeme Specificationin this document (below), not the one distributed in class.

    • The exported numerical constants and category names are different.



  • Modulelexitshould export nothing other than tablecatnames, seven constants representing lexeme categories, and functionlex. You may write anything you want in the source code for the module, as long as it islocaland not exported.


The following properties of modulelexershould hold for modulelexitas well.



  • At each iteration, the iterator function returns a pair: a string, which is the string form of the lexeme, and a number representing the category of the lexeme.

  • The number mentioned in the previous point is suitable as a key for tablecatnames.

  • The iteration ends when there are no further lexemes.


The correspondence between lexeme category numbers and category names/strings should be as follows.














































Category
Number
Named
Constant
Printable
Form
1
lexit.KEY

Keyword
2
lexit.ID

Identifier
3
lexit.NUMLIT

NumericLiteral
4
lexit.STRLIT

StringLiteral
5
lexit.OP

Operator
6
lexit.PUNCT

Punctuation
7
lexit.MAL

Malformed

Thus, the following code should work.



[Lua]


lexit = require "lexit"  program = "x = 3 # Set a variable\n write(x+4, cr)\n"  for lexstr, cat in lexit.lex(program) do     print(lexstr, lexit.catnames[cat]) end


Lexeme Specification


You will write a lexer that is to be part of an interpreter for a programming language calledJerboa.



Whitespacecharacters are blank, tab, vertical-tab, new-line, carriage-return, form-feed. Except forStringLiterallexemes, no lexeme may contain a whitespace character. So a whitespace character, or any contiguous group of whitespace characters, is a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.


Acommentbegins with pound sign (#) occurring outside aStringLiterallexeme, and ends at a newline character or the end of the input, whichever comes first. There are no other kinds of comments. Any character at all may occur in a comment.


Comments are treated by the lexer as whitespace: they are not part of lexemes and are not passed on to the caller.



Legalcharacters outside comments andStringLiterallexemes are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments andStringLiterallexemes areillegal.


Once a lexeme has begun, the maximal-munch rule is followed,except in the following special case. If a lexeme begins with “+” or “-”, and it is not the first lexeme in the input, and the previous lexeme was anIdentifier,NumericLiteral, right parenthesis (“)”), right bracket (“]”),true, orfalse, then this lexeme is a one-characterOperator.


There are seven lexeme categories:Keyword,Identifier,NumericLiteral,StringLiteral,Operator,Punctuation,Malformed.




Keyword

One of the following twelve:
cr
def
else
elseif
end
false
if
readnum
return
true
while
write


Note. The reserved words are the same as theKeywordlexemes.




Identifier

A letter or underscore (_), followed by zero or more characters that are all letters, digits, or underscores, and not a keyword. (Examples: “myvar”, “_”, “_x_37_”.)


NumericLiteral

A sequence of one or more digits, possibly preceded by an optional plus-sign (+) or minus-sign (-), and possibly followed by an optionalexponent.

Anexponentis the letter “e” or “E” followed by an optional “+”, and then one or more digits.



Notes. ANumericLiteralcannot contain a dot (
.
). A minus sign is not legal in an exponent. A plus sign is legal, and optional, in an exponent. An exponent must contain at least one digit.


For example, here are some validNumericLiterallexemes.



1234   -23   00900   +123e+7   -00E00   3e888


The following arenotvalidNumericLiterallexemes.



+3e   e   123E+   +-3   1.23   123e-7



The first string above is aNumericLiteral(
+3
) followed by anIdentifier(
e
). The second is anIdentifier(
e
). The third is aNumericLiteral(
123
), anIdentifier(
E
), and anOperator(
+
). The fourth is anOperator(
+
) and aNumericLiteral(
-3
). The fifth is aNumericLiteral(
1
), aPunctuation(
.
), and aNumericLiteral(
23
). The last is aNumericLiteral(
123
), anIdentifier(
e
), and aNumericLiteral(
-7
).




StringLiteral

A single quote (') or double quote ("), followed by zero or more characters that are not the same as the initial quote mark and not newline, followed by a duplicate of the initial quote mark. AStringLiteralends with the same character it began with. The beginning and ending quote marks are both part of the lexeme. Any character at all may occur between the quotes of aStringLiteral, except for newline and the quote that began the literal. There are no backslash escapes. (Examples: “'Hello'”, “"bye-bye"”, “'abc"abc'”.)


Operator

One of the following seventeen:
&&
||
!
==
!=


>
>=
+
-
*
/
%
[
]
=


Punctuation

Any single legal character that is not whitespace, not part of a comment, and not part of any valid lexeme in one of the other categories, includingMalformed. (Examples: “&”, “(”.)


Malformed

There are two kinds ofMalformedlexemes:bad characterandbad string.

Abad characteris any single character that is illegal, that is not part of a comment or aStringLiterallexeme that began earlier.


Abad stringis essentially a partialStringLiteralin which either a newline character or the end of the input is reached before the ending quote mark. It begins with a single or double quote mark that is not part of a comment or aStringLiterallexeme that began earlier, and continues to either a newline character (which is included in the lexeme) or the end of the input, without the matching ending quote mark appearing. (Examples: “'a-b-c”, “"xyz”, both either with a newline appended or ending at the end of the input.)



Note. The two kinds ofMalformedlexemes are presented to the caller in the same way: they are both simplyMalformed.




Test Program


A test programis available in the Git repository:lexit_test.lua. If you compile and run this program (unmodified!) with your code, then it will test whether your code works properly.


Do not turn in the test program.


Notes



  • You will use your code from Assignment3 again in Assignment4. It will also need to work with your code in Assignment6. Do an extra good job!

  • SeeThoughts on Assignment3, on the lecture slides for February11.

Feb 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here