Dalhousie University Department of Electrical and Computer Engineering ECED 3403 – Computer Architecture Assignment 1: Designing, implementing, and testing an XM3 disassembler 1 The problem Assemblers...

The source code needs to be done in C language. In the assignment, all I require is the software (Section 6.2). The design document (Section 6.1) and the testing (Section 6.3) can be ignored.


Dalhousie University Department of Electrical and Computer Engineering ECED 3403 – Computer Architecture Assignment 1: Designing, implementing, and testing an XM3 disassembler 1 The problem Assemblers take source modules written in assembly language and produce load modules (or object modules, as we will see in Assignment 2) Subsequently, load modules can be loaded into a target machine’s memory for execution by the machine’s loader (see Figure 1). Figure 1: Assemblers produce load modules that can be loaded into memory by a loader The load module contains the machine code equivalent of the instructions and data found in the source module. There are also loader directives, instructing the loader what to do with the load module; for example, where to place each instruction or data in the machine’s memory. When testing programs that are executing, some machines support debuggers which display the equivalent assembly language instruction for each machine code instruction in memory (see Table 1). This is intended to make the tester’s life easier, reducing the need to constantly refer to the list file output from the assembler or linker (again, something for Assignment 2). Table 1: Example of assembler file, memory contents, and equivalent debugger output Assembler file Memory contents Debugger output ld R1,R2 1000 580A 1000 LD R1,R2 add #1,R2 1002 408A 1002 ADD #1,R2 st R2,R1 1004 5C11 1004 ST R2,R1 The tool used to translate machine code to its assembler code equivalent is referred to as a disassembler. A disassembler can be part of a debugger or it can be a stand-alone application that converts the contents of a load module into an equivalent source module (see Figure 2). The load module differs from a memory-resident machine language program in that it contains the machine code (instructions and data) and loader directives. When a load module is loaded into memory, the loader follows the instructions specified by the directives and typically does not store the directive in memory, although the machine code and data are stored. ECED 3403: Assignment 1 – Disassembler 2 2 Figure 2: The disassembler creates assembler instructions from the contents of the load module Disassemblers can be used by a variety of users, for example: • Legacy software might only exist as load modules. Those responsible for the software could reconstruct it by running it through a disassembler. • If disaster strikes and someone forgot to do backups of the assembler source files, the disassembler could be used to restore the load module. • Organizations (both good and evil) wanting to know how a block of software works could use a disassembler to obtain the load module to examine the code. This is sometimes referred to a reverse engineering. • Similarly, if malignant software infects a machine, the white hats could extract the malignant software and disassemble it to determine what nefarious act it is performing. 2 Objectives In this assignment, you are to design, implement, and test an XM3 disassembler which takes XM3 load modules to create the equivalent XM3 assembly language source module. The finished product should support drag-and-drop input as well as command-line input. The input to the disassembler should be an XM3 load module. Normally, load modules have an extension of XME. Files which do not contain loader records should be rejected and a diagnostic issued. The disassembler should convert each load module record into the equivalent assembler instructions (there can be one or more instructions in each load module record). In some cases, a load module record could contain data rather than instructions, or a mixture of instructions and data. If the disassembler detects an error (for example, a bad load module record), it should display the record, its record number, and the reason for the error. Each correctly disassembled instruction or data should be written to the output file in the following format: Instruction: The instruction equivalent of the opcode and its operands should be written in the correct format for the disassembled instruction. For example, 6001 would indicate that the low byte of register 1 should be assigned zero: MOVL #0,R1 ECED 3403: Assignment 1 – Disassembler 3 3 The two branching instructions will be exceptional cases, requiring the hexadecimal value of the actual address should be prefixed with ‘#’ and the value. For example, if the address to be branched to is 1282 (hexadecimal), the output should be: BRA #1282 Directives: If there is a discontinuity in the load address in the load module record (for example, if location #100 is expected, but the load record indicated #180), the disassembler should output an origin directive (ORG): ORG #180 Other: An unknown value should be treated as a 16-bit hexadecimal data value and output using the WORD directive; for example, if the value FF14 was found as the next 16-bit value in the load record and it cannot be turned into an instruction, the output would be (the ‘#’ symbol indicates a hexadecimal value): WORD #FF14 The output file should be given the filename of the input file with an extension of DASM to indicate that the module is an assembly module that has been obtained from a load module using your disassembler. For example, TEST.XME would become TEST.DASM. 3 Load modules As previously discussed, load modules contain directives and machine code (instructions and data) for the target machine. In our case, the target machine is XM3 and the load module contains S-Records. 3.1 The S-Record The XM3 assembler creates XM3 load modules consisting of S-Record types. The S-Record in data dictionary format is: SRecord = Header + Length + Address + Contents + CheckSum Header = [“S0” | “S1” | “S9”] Length = Byte * 00 to FF (Covers Address to CheckSum) Address = 2{Byte}2 * 0000 to FFFF Contents = 0{Byte}31* 0 to 31 instruction or data bytes1 CheckSum = Byte * One’s complement of Length to Contents Byte = [00 | 01 | … | FE | FF] The three Header S-values are interpreted as follows: S0: The source module name. S1: The data values or instructions, or both to be stored in contiguous memory locations. S9: The initial entry point (i.e., the starting address). The assembler calculates the CheckSum by summing each byte (Length to last Contents’ byte} and taking its complement (C’s tilde operator, ‘~’). The loader (and soon-to-be-completed 1 This is non-standard (that is, the S-Record definition was misinterpreted and has not been corrected yet). The S- Record definition states that the record length should be stored as 4-bit nibbles, from 0 to 63. ECED 3403: Assignment 1 – Disassembler 4 4 disassembler) determine whether the input record is valid by adding all the bytes read (i.e., the Length, Address, Contents, and Checksum fields) together. By including the Checksum field in this total, the sum will include its complement, giving what value? This value can be used to indicate whether the input record is correct. A complete description of the S-Record format can be found here. 3.2 Example The following is an XM3 load module in S-Record format: S00A00004131612E61736DB3 S109100001689940FD2384 S9031000EC The first record contains the source module name (S0); the second record contains instructions, or data, or both (S1); and the last record contains the starting address. For example, the S0 record is: Length: 0A. Ten-bytes long. Address: 0000. The address field is ignored in S0 records. Contents: 4131612E61736DB3. This is the source filename, consisting of 8-bit ASCII characters, displayed in hexadecimal. The filename is “A1a.asm”. (You should check this in case it is incorrect.) Checksum: B3. Is this correct? While the S1 record contains: Length: 09. Nine-bytes long. Address: 1000. The first byte is loaded into location 1000 (hex). Contents: 01689940FD23. 01 is loaded into location 1000, 68 into location 1001, and so on. XM3 is little endian, so 68 is the most significant byte and 01 is the least significant byte. CheckSum: 84. And the S9 record’s contents are: Length: 03. Three bytes in length. This length means that only the address field (two bytes) and CheckSum field (one-byte) are included, the Contents are omitted. How do we know this? Address: 1000. This is the starting address, assigned to the PC when the program is executed. Checksum: EC. 4 The Disassembler Once an S-Record has been read determined to be valid, it is necessary determine whether it is an instruction or data. This will involve taking the bytes read from the S-Record and combining http://www.amelek.gda.pl/avr/uisp/srecord.htm ECED 3403: Assignment 1 – Disassembler 5 5 them into 16-bit values; for example, using the S1 record from section 3.2, we find the 16-bit values are (in little endian): 6801, 4099, and 23FD. We will now need the structure of each instruction (i.e., opcode and operands) to determine whether the 16-bit value read is an instruction or data. We can do this by examining the structure of each instruction in the XM3 Instruction Set Architecture manual and developing an algorithm to decode the 16-bit value,2 for example (see Table 24 in the manual): 6801: The most significant nibble is ‘6’, meaning that the instruction is either MOVL or MOVLZ. The next nibble is ‘8’, since the most-significant bit of the nibble is set, the instruction must be MOVLZ. The next eight bits are the byte to store in the register. The byte bits are 0000.0000 (i.e., #0), while the least-significant three bits indicate the destination register, 001, or R1. From this we can reconstruct the instruction as: MOVLZ #0,R1 4099: The most-significant nibble is ‘4’, meaning that the instruction can be one of ADD through SXT. The next nibble is ‘0’, which narrows the instruction down to ADD. The next byte is ‘99’, the leftmost two bits are 10, indicating that R/C is set (a constant is being used) and W/B
Jul 03, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here