Peeking behind the curtain

What's new

I have just uploaded a small update again to the Subversion repository. In this update you can find the following changes:
  • Implementation of all form of the following addressing modes for both as source and as destination addressing:
    • immediate long, word, byte;
    • indirect addressing: (Ax);
    • indirect addressing with pre-decrement: -(Ax);
    • indirect addressing with post-increment: (Ax)+;
  • Implementation of instructions:
    • MOVE.x #imm,Dy;
    • MOVEA.x #imm,Ay;
    • MOVEQ #imm,Rx;
    • MOVE.x Rx,Ry
  • Code was refactored to avoid repetition of the same code chunks in the addressing modes and instructions.
  • Implementation dumping the compiled code to the console.
  • New configuration (comp_log_compiled) is also added to turn it on/off.

Featuring...

It is always exciting to find out some magic details about the internal behavior of a complex system. I remember when I realized for the first time how the texts are stored in the Commodore Plus/4 games, back in the days. I was amazed how I can change the text that showed up on the bottom scroll of Tycoon Tex.
Let me offer some small excitement to you all: I just implemented a funny little feature in the E-UAE JIT engine. Now we can turn on dumping the compiled code to the console, together with the original Motorola 68k instruction that was compiled and the macroblocks that describe the intermediate translation form.
The purpose of this feature wasn’t (purely) the entertainment, but I was really fed up with the situation that the generated code cannot be debugged properly. Previously, I added a trap instruction (tw) into the translated code at some point, so I was able to have a look on the output from the Grim Reaper window (which was awesome, but let’s not mix it up with actual debugging).
Too bad that GDB is so limited: it cannot debug into any code segment that wasn’t loaded by DOS (like generated code). Not to mention how cumbersome the console interface is... (Or am I missing something? Enlight me please.)
I would like to thank Frank Wille the sources for the PowerPC disassembler that makes it possible to list the translated code.
How to turn on this feature: there are two settings that control the logging. These are:
  • comp_log – if it was set to “true” or “yes” then the JIT logging is turned on and dumped to the standard output.
  • comp_log_compiled – if it was set to “true” or “yes” then the compiled code is listed through the JIT logs.

Let’s see a small demonstration of this feature, shall we? (Not for the faint-hearted!)

The following list is the output from the very simple test code: iamalive.asm, slightly edited and formatted for educational purposes...
  1. M68k: ADD.L #$00000001,D1
    1. Mblk: load_memory_long
      Dism: lwz r15,64(r14)
      Mblk: load_memory_long
      Dism: lwz r3,68(r14)
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r3,16,26,26
    2. Mblk: load_memory_long
      Dism: lwz r3,4(r14)
    3. Mblk: load_register_long
      Dism: li r4,1
    4. Mblk: add_with_flags
      Dism: addco. r3,r3,r4
    5. Mblk: copy_nzcv_flags_to_register
      Dism: mcrxr cr2
      Dism: mfcr r15
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r15,16,26,26
  2. M68k: MOVE.W D1,(A0,$0180) == $00dff180
    1. Mblk: load_memory_long
      Dism: lwz r4,32(r14)
    2. Mblk: add_register_imm
      Dism: addi r5,r4,384
    3. Mblk: check_word_register
      Dism: extsh. r0,r3
      Mblk: copy_nz_flags_to_register
      Dism: mfcr r6
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r6,0,0,2
      Mblk: rotate_and_mask_bits
      Dism: rlwinm r15,r15,0,11,8
    4. Mblk: save_memory_long
      Dism: stw r3,4(r14)
    5. Mblk: save_memory_spec
      Dism: mr r4,r3
      Dism: mr r3,r5
      Dism: rlwinm r0,r3,18,14,29
      Dism: lis r5,27315
      Dism: ori r5,r5,23016
      Dism: lwzx r5,r5,r0
      Dism: lwz r5,16(r5)
      Dism: mtlr r5
      Dism: blrl
  3. M68k: BT.B #$fffffff8 == 0000001a (TRUE)
    1. Mblk: save_memory_long
      Dism: stw r15,64(r14)
      Mblk: save_memory_word
      Dism: sth r15,68(r14)
    2. Mblk: load_register_long
      Dism: lis r3,27606
      Dism: ori r3,r3,45096
      Mblk: save_memory_long
      Dism: stw r3,76(r14)
    3. Mblk: opcode_unsupported
      Dism: li r3,24824
      Dism: lis r4,27315
      Dism: ori r4,r4,21752
      Dism: bl 0x7f91acc0

  4. Done compiling
Colorful, isn't it? :)

Okay, let's try to understand what is going on.

I marked the three Motorola 68k instruction that was compiled here with orange color, the code roughly looks like this:
1. Increase register D0 by one;
2. Put the content of register D0 to the address that is calculated by using register A0 plus offset of 0x180 (A0 was initialized previously with the value: 0xDFF000, which is the base of the custom chipset memory area) - in layman terms: load it to the background color.
3. Go back to step 1.
Now, let's see the second level of the list:

First of all the prefix "Mblk:" marks the macroblocks (white), "Dism:" is the actual PowerPC code (yellow).
As I already mentioned earlier: some macroblocks can be optimized away (although it is not implemented yet), and a macroblock means at least one PowerPC instruction, but it can be a series of instructions also.

The steps can be interpreted roughly as:
1.1. Load the arithmetic flags from the memory where the interpretive emulator stores them.
1.2. Load the previous content of the emulated D0 register into a PPC register.
1.3. Load the constant for the add instruction (one) into a PPC register.
1.4. Add the second register to the first one (increase D0 by one).
1.5. Save the arithmetic flags after the operation.
2.1. Load the previous content of the emulated A0 register into a PPC register.
2.2. Add the offset (0x180) to the content of A0 and load it into a new PPC register.
2.3. Check the content of the emulated D0 register to set up the arithmetic flags according to it.
2.4. Save back the modified D0 register to the memory for the interpretive emulator.
2.5. Calculate the offset and load the function address for the memory write operation handler and call it (namely the custom chipset write handler). This is a function from the interpretive emulation and it was written in C, therefore we must store all volatile registers back to the memory, the C code won't preserve these. (This is why we stored the D0 register in step 2.4.)
3.1. Save the arithmetic flags back to memory where the interpretive emulator stores them. (These were kept in a non-volatile register, so these were preserved while we called the helper function in step 2.5.)
3.2. Update the emulated PC register to the current state for the following instructions.
3.3. Call the interpretive emulation for the branch instruction (because it is not implemented yet, so we reuse the interpretive implementation).
4. Done. Phew.
Funny, eh? :)

If you are not familiar with assmebly then don't stretch yourself too much by trying to understand this techno-blahblah.

For the rest: who can spot what can be optimized on the compiled code?