CS/CE 4337 Project Fall 2021 Project (Part 1) This project will explore the topics learned in class, and give you some simple hands- on experience. After completing all three parts, you should have a...

1 answer below »
I need help help on a computer project , should have a knowledge of syntax ,semantics, lexer and token and so on . it will be an ongoing project throughout the semester. because each topic covered there would some changes we have to make in the program.


CS/CE 4337 Project Fall 2021 Project (Part 1) This project will explore the topics learned in class, and give you some simple hands- on experience. After completing all three parts, you should have a working interpreted language that is unique to you. You are encouraged to have fun and go above and beyond if you have the time. Part 1 gives you a simple imperative language. Study the syntax and semantics pre- sented in this document. Be sure to understand the language defined. You will then need to make a small, but substantial change, and create the lexer for the modified language. You will be making other changes to your language as the semester goes forward. So, take that into consideration when writing your code. All code should be written in either C,C++,Java, or Python. 1 What to do • Study the syntax and semantics • Modify the language: The modification should be significant, requiring a change in both syntax and seman- tics. You can add a new feature (like a for loop) or modify the way a feature works (Like making user input more advanced). Examples: – Add a for loop – Change get to accept a list of IDs and get multiple integers – Add a simple printf – add simple array • Construct a state diagram for the lexer. The tokens and lexemes may be affected by your changes above. • Implement the lexer in your chosen language. You should include a test driver pro- gram that runs the lexer and prints all tokens to the screen – like the example from class. 2 What to turn in Upload your submission as a zip archive containing the following: • Source code (c, c++, python, or java files) – Source code should not require a particular IDE to compile and run. – Should work on the cs1 and cs2 machines • Readme (Plain text document) Organization of Programming Languages Page 1 CS/CE 4337 Project Fall 2021 – List the files included in the archive and their purpose – Explain how to compile and run your project – Include any other notes that the TA may need • Write-up (Microsoft Word or pdf format) – The state transition diagram – In addition, if you did not complete some feature of the project, why not? ∗ What unsolvable problems did you encounter? ∗ How did you try to solve the problems? ∗ Where do you think the solution might lay? · What would you do to try and solve the problem if you had more time? 3 Grading The grade for this project will be out of 100, and broken down as follows: The change to the language 25 The state transition diagram 15 The lexer code 50 Correct Output 10 If you were not able to complete some part of the program discussing the problem and potential solutions in the write-up will reduce the points deducted for it. For example, suppose there is a bug in your code that sometimes allows two customers to approach the same worker, and could not figure out the problem before the due date. You can write 2-3 paragraphs in the write-up to discuss this issue. Identify the error and discuss what you have done to try to fix it/find the problem point, and discuss how you would proceed if you had more time. Overall, inform me and the TA that you know the problem exists and you seriously spend time trying to fix the problem. Normally you may lose 5 points (since it is a rare error) but with the write-up you only lose 2. These points can make a large difference if the problem is affecting a larger portion of the program. 4 The Language 4.1 Syntax Ñ (1) Ñ �(2) | “;” (3) Ñ (4) |
(5) | (6) | (7) Organization of Programming Languages Page 2 CS/CE 4337 Project Fall 2021 | (8) Ñ “print” (9) Ñ STRING(10) | (11)

Ñ “get” ID(12) Ñ ID “=” (13) Ñ “if” “then” “else” “end”(14) Ñ “while” “do” “end”(15) Ñ (16) Ñ �(17) | “and” (18) | “or” (19) Ñ (20) Ñ �(21) Ñ (22) Ñ �(23) | “*” (24) | “/” (25) | “%” (26) Ñ (27) Ñ �(28) | “>” (29) | “>=” (30) | “<”>(31) | “<=”>(32) | “==” (33) | “!=” (34) Ñ “(” “)”(35) | “not” (36) | “-” (37) | ID(38) | INT(39) 4.1.1 Tokens This subsection describes the token used in the above grammar. Provided for each token is a regex and a description. The regex is for those that know regular expressions and prefer it as a description. The description says the same thing in English. Preprocessing describes how the lexeme is transformed before passing it to the parser. Organization of Programming Languages Page 3 CS/CE 4337 Project Fall 2021 STRING As a regex: "([ˆ"]|\")*" Description: A quotation mark followed by zero or more characters, where quo- tation marks must be preceded by a backslash, followed by another quotation mark. Preprocessing: The first and last quotation marks are removed. Scanning from left to right, “\\” is replaced with “\”, “\t” is replaced with a tab, “\n” is replaced with a newline, “\"” is replaced with “"”, and any “\” that is followed by anything else is removed. ID As regex: [_a-zA-Z][_a-zA-Z0-9]* Description: A letter or underscore followed by a combination of zero or more let- ters, underscores or digits. INT As Regex: (+|-)?[0-9]+ Description: an optional “+” or “-” followed by one or more digits. 4.2 Static Semantics 1 .ids “ tu 3 r0s.ids “ t .idu Y r0s.ids .ids “ r0s.ids 4 .ids “ .ids 5 .id “

.id 6 .id “ .id .ids “ .ids 7 .ids “ .ids 8 .ids “ .ids 9 .ids “ .ids 11 .ids “ .ids 12

.id “ ID .id 13 .id “ ID .id 16 .ids “ .ids “ Organization of Programming Languages Page 4 CS/CE 4337 Project Fall 2021 18 .ids “ 19 .ids “ 20 .ids “ .ids “ 22 .ids “ .ids 23 .ids “ .ids 24 .ids “ .ids .ids “ .ids 26 .ids “ .ids 27 .ids “ .ids 28 .ids “ .ids 29 .ids “ .ids .ids “ .ids 31 .ids “ .ids 32 .ids “ .ids 33 .ids “ .ids 34 .ids “ .ids 35 .ids “ .ids 36 .ids “ .ids 37 .ids “ .ids 38 r1s.ids “ r0s.ids 39 r1s.ids “ r0s.ids 40 Predicate: ID .id P .ids 4.3 Dynamic Semantics This section gives the dynamic semantics of the language using denotational semantics. Consider the demsem function the denotational semantics for this language. We will use a mapping from variable name to value to represent the symbol table of the program during execution, and in code can be represented as a HashMap or similar datatype in your language of choice. We will use a sequence of characters to represent the output of a program, with � representing the empty sequence. I will also assume that all strings will be represented as sequences of characters. Assume there is a function append that, when given two sequences, appends the second sequence to the first. Also assume, there is a function Organization of Programming Languages Page 5 CS/CE 4337 Project Fall 2021 seq that takes an integer and gives a sequence of characters representing that integer as text. Assume there are the functions head, which maps a sequence to its first element, tail, which maps a sequence to a new one created by removing the first element, clean, which maps a sequence of input characters to a new sequence by removing any non-digits from the front of the sequence, and int that maps a sequence of digits to the corresponding integer. If the sequence is empty, int will give zero. A state, as well as the meaning of a program, will be a 3-tuple consisting of a variable name mapping function, a sequence of input characters and an output sequence. The initial state for any program is ptu, i, �q, where i is some sequence of characters the user will input. If a token (represented by all caps and bold font) appears as a value on the right hand side of a function definition, then replace it with its lexeme. So if a ID was generated by the lexer from an x, then replace ID with x. densemp�, pθ, i, pqq “ pθ, i, pq densemp “;” , pθ, i, pqq “ densemp , densemp , pθ, i, pqqq densemp “print” STRING , pθ, i, pqq “ pθ, i, appendpp, STRING qq densemp “print” , pθ, i, pq “ pθ, i, appendpp, seqpoutqqq where out “ exprsemp q densemp “get” ID , pθ, i, pqq “ pθ1, i1, pq where px, i1q “ getIntpcleanpiqq θ1pnq “ if n “ ID then x else θpnq densemp ID “=” , pθ, i, pqq “ pθ1, i, pq where θ1pnq “ if n “ ID then exprsemp , θq else θpnq densemp , pθ, i, pqq “ if exprsemp . , θq ‰ 0 then densemp . , pθ, i, pqq else densemp . , pθ, i, pqq densemp , pθ, i, pqq “ if exprsemp . , θq “ 0 then pθ, i, pq else densemp , densemp . , pθ, i, pqqq exprsemp , θq “ if . “ � then exprsemp . , θq else bexprsemp . , exprsemp . q, θq exprsemp , θq “ if . “ � then exprsemp . , θq else texprsemp . , exprsemp . q, θq Organization of Programming Languages Page 6 CS/CE 4337 Project Fall 2021 exprsemp , θq “ if
Answered 11 days AfterSep 26, 2021

Answer To: CS/CE 4337 Project Fall 2021 Project (Part 1) This project will explore the topics learned in class,...

Kamal answered on Oct 08 2021
120 Votes
#include
#include
#include
#include
#define int long long // to work with 64bit address
int debug; // print the executed instructions
int assembly; // print out the assembly and source
int token; // current token
// instructions
enum { LEA ,IMM ,JMP ,CALL,JZ ,JNZ ,ENT ,ADJ ,LEV ,LI ,LC ,SI ,SC ,PUSH,
OR ,XOR ,AND ,EQ ,NE ,LT ,GT ,LE ,GE ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,
OPEN,READ,CLOS,PRTF,MALC,MSET,MCMP,EXIT };
// tokens and classes (operators last and in precedence order)
// copied from c4
enum {
Num = 128, Fun, Sys, Glo, Loc, Id,
Char, Else, Enum, If, Int, Return, Sizeof, While,
Assign, Cond, Lor, Lan, Or, Xor, And, Eq, Ne, Lt, Gt, Le, Ge, Shl, Shr, Add, Sub, Mul, Div, Mod, Inc, Dec, Brak
};
// fields of identifier
enum {Token, Hash, Name, Type, Class, Value, BType, BClass, BValue, IdSize};
// types of variable/function
enum { CHAR, INT, PTR };
// type of declaration.
enum {Global, Local};
int *text, // text segment
*stack;// stack
int * old_text; // for dump text segment
char *data; // data segment
int *idmain;
char *src, *old_src; // pointer to source code string;
int poolsize; // default size of text/data/stack
in
t *pc, *bp, *sp, ax, cycle; // virtual machine registers
int *current_id, // current parsed ID
*symbols, // symbol table
line, // line number of source code
token_val; // value of current token (mainly for number)
int basetype; // the type of a declaration, make it global for convenience
int expr_type; // the type of an expression
// function frame
//
// 0: arg 1
// 1: arg 2
// 2: arg 3
// 3: return address
// 4: old bp pointer<- index_of_bp
// 5: local var 1
// 6: local var 2
int index_of_bp; // index of bp pointer on stack
void next() {
char *last_pos;
int hash;
while (token = *src) {
++src;
if (token == '\n') {
if (assembly) {
// print compile info
printf("%d: %.*s", line, src-old_src, old_src);
old_src = src;
while (old_text < text) {
printf("%8.4s", & "LEA ,IMM ,JMP ,CALL,JZ ,JNZ ,ENT ,ADJ ,LEV ,LI ,LC ,SI ,SC ,PUSH,"
"OR ,XOR ,AND ,EQ ,NE ,LT ,GT ,LE ,GE ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,"
"OPEN,READ,CLOS,PRTF,MALC,MSET,MCMP,EXIT"[*++old_text * 5]);
if (*old_text <= ADJ)
printf(" %d\n", *++old_text);
else
printf("\n");
}
}
++line;
}
else if (token == '#') {
// skip macro, because we will not support it
while (*src != 0 && *src != '\n') {
src++;
}
}
else if ((token >= 'a' && token <= 'z') || (token >= 'A' && token <= 'Z') || (token == '_')) {
// parse identifier
last_pos = src - 1;
hash = token;
while ((*src >= 'a' && *src <= 'z') || (*src >= 'A' && *src <= 'Z') || (*src >= '0' && *src <= '9') || (*src == '_')) {
hash = hash * 147 + *src;
src++;
}
// look for existing identifier, linear search
current_id = symbols;
while (current_id[Token]) {
if (current_id[Hash] == hash && !memcmp((char *)current_id[Name], last_pos, src - last_pos)) {
//found one, return
token = current_id[Token];
return;
}
current_id = current_id + IdSize;
}
// store new ID
current_id[Name] = (int)last_pos;
current_id[Hash] = hash;
token = current_id[Token] = Id;
return;
}
else if (token >= '0' && token <= '9') {
// parse number, three kinds: dec(123) hex(0x123) oct(017)
token_val = token - '0';
if (token_val > 0) {
// dec, starts with [1-9]
while (*src >= '0' && *src <= '9') {
token_val = token_val*10 + *src++ - '0';
}
} else {
// starts with number 0
if (*src == 'x' || *src == 'X') {
//hex
token = *++src;
while ((token >= '0' && token <= '9') || (token >= 'a' && token <= 'f') || (token >= 'A' && token <= 'F')) {
token_val = token_val * 16 + (token & 15) + (token >= 'A' ? 9 : 0);
token = *++src;
}
} else {
// oct
while (*src >= '0' && *src <= '7') {
token_val = token_val*8 + *src++ - '0';
}
}
}
token = Num;
return;
}
else if (token == '/') {
if (*src == '/') {
// skip comments
while (*src != 0 && *src != '\n') {
++src;
}
} else {
// divide operator
token = Div;
return;
}
}
else if (token == '"' || token == '\'') {
// parse string literal, currently, the only supported escape
// character is '\n', store the string literal into data.
last_pos = data;
while (*src != 0 && *src != token) {
token_val = *src++;
if (token_val == '\\') {
// escape character
token_val = *src++;
if (token_val == 'n') {
token_val = '\n';
}
}
if (token == '"') {
*data++ = token_val;
}
}
src++;
// if it is a single character, return Num token
if (token == '"') {
token_val = (int)last_pos;
} else {
token = Num;
}
return;
}
else if (token == '=') {
// parse '==' and '='
if (*src == '=') {
src ++;
token = Eq;
} else {
token = Assign;
}
return;
}
else if (token == '+') {
// parse '+' and '++'
if (*src == '+') {
src ++;
token = Inc;
} else {
token = Add;
}
return;
}
else if (token == '-') {
// parse '-' and '--'
if (*src == '-') {
src ++;
token = Dec;
} else {
token = Sub;
}
return;
}
else if (token == '!') {
// parse '!='
if (*src == '=') {
src++;
token = Ne;
}
return;
}
else if (token == '<') {
// parse '<=', '<<' or '<'
if (*src == '=') {
src ++;
token = Le;
} else if (*src == '<') {
src ++;
token = Shl;
} else {
token = Lt;
}
return;
}
else if (token == '>') {
// parse '>=', '>>' or '>'
if (*src == '=') {
src ++;
token = Ge;
} else if (*src == '>') {
src ++;
token = Shr;
} else {
token = Gt;
}
return;
}
else if (token == '|') {
// parse '|' or '||'
if (*src == '|') {
src ++;
token = Lor;
} else {
token = Or;
}
return;
}
else if (token == '&') {
// parse '&' and '&&'
if (*src == '&') {
src ++;
token = Lan;
} else {
token = And;
}
return;
}
else if (token == '^') {
token = Xor;
return;
}
else if (token == '%') {
token = Mod;
return;
}
else if (token == '*') {
token = Mul;
return;
}
else if (token == '[') {
token = Brak;
return;
}
else if (token == '?') {
token = Cond;
return;
}
else if (token == '~' || token == ';' || token == '{' || token == '}' || token == '(' || token == ')' || token == ']' || token == ',' || token == ':') {
// directly return the character as token;
return;
}
}
}
void match(int tk) {
if (token == tk) {
next();
} else {
printf("%d: expected token: %d\n", line, tk);
exit(-1);
}
}
void expression(int level) {
// expressions have various format.
// but majorly can be divided into two parts: unit and operator
// for example `(char) *a[10] = (int *) func(b > 0 ? 10 : 20);
// `a[10]` is an unit while `*` is an operator.
// `func(...)` in total is an unit.
// so we should first parse those unit and unary operators
// and then the binary ones
//
// also the expression can be in the following types:
//
// 1. unit_unary ::= unit | unit unary_op | unary_op unit
// 2. expr ::= unit_unary (bin_op unit_unary ...)
// unit_unary()
int *id;
int tmp;
int *addr;
{
if (!token) {
printf("%d: unexpected token EOF of expression\n", line);
exit(-1);
}
if (token == Num) {
match(Num);
// emit code
*++text = IMM;
*++text = token_val;
expr_type = INT;
}
else if (token == '"') {
// continous string "abc" "abc"
// emit code
*++text = IMM;
*++text = token_val;
match('"');
// store the rest strings
while (token == '"') {
match('"');
}
// append the end of string character '\0', all the data are default
// to 0, so just move data one position forward.
data = (char *)(((int)data + sizeof(int)) & (-sizeof(int)));
expr_type = PTR;
}
else if (token == Sizeof) {
// sizeof is actually an unary operator
// now only `sizeof(int)`, `sizeof(char)` and `sizeof(*...)` are
// supported.
match(Sizeof);
match('(');
expr_type = INT;
if (token == Int) {
match(Int);
} else if (token == Char) {
match(Char);
expr_type = CHAR;
}
while (token == Mul) {
match(Mul);
expr_type = expr_type + PTR;
}
match(')');
// emit code
*++text = IMM;
*++text = (expr_type == CHAR) ? sizeof(char) : sizeof(int);
expr_type = INT;
}
else if (token == Id) {
// there are several type when occurs to Id
// but this is unit, so it can only be
// 1. function call
// 2. Enum variable
// 3. global/local variable
match(Id);
id = current_id;
if (token == '(') {
// function call
match('(');
// pass in arguments
tmp = 0; // number of arguments
while (token != ')') {
expression(Assign);
*++text = PUSH;
tmp ++;
if (token == ',') {
match(',');
}
}
match(')');
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here