IMPORTANT RULES FOR ARM CODE WRITERS

Date: 17/5/91
Issue: 2.5

Every effort has been made to ensure that the information in this document is true and correct at the date of issue. Products described in this document, however, are subject to continuous development and improvements and Advanced RISC Machines Ltd (and other contributors) reserve the right to change their specifications at any time. Advanced RISC Machines Ltd cannot accept liability for any loss or damage arising from the use of any information or particulars in this document.


In order to be upwards compatible with future versions of the ARM processor family NEVER use any of the undefined instruction formats: both those shown in the manual as "Undefined" which the processor traps AND those which are not shown in the manual and which don't trap (for example a Multiply instruction where bit 5 or 6 of the instruction is set). In addition the "NV" (never executed) instruction class should not be used [It is recommended that the instruction "MOV R0,R0" be used as a general purpose NOP].

This document lists the instruction code sequences to be avoided. It is *STRONGLY* recommended that you take the time to familiarise yourself with these cases because some will only fail under particular circumstances which may not arise during testing.


Unless a program is being targeted SPECIFICALLY for a single version of the ARM processor family, all of these recommendations should be adhered to.
  1. TSTP/TEQP/CMPP/CMNP: Changing mode
    #################################################################### # When the processor's mode is changed by altering the mode bits # # in the PSR using a data processing operation, care must be taken # # not to access a banked register (R8-R14) in the following # # instruction. Accesses to the unbanked registers (R0-R7,R15) are # # safe. # #################################################################### # Applicability: ARM2 # #################################################################### The following instructions are affected, but note that mode changes can only be made when the processor is in a non-user mode:-

    TSTP Rn,<Op2>
    TEQP Rn,<Op2>
    CMPP Rn,<Op2>
    CMNP Rn,<Op2>

    These are the only operations that change all the bits in the PSR (including the mode bits) without affecting the PC (thereby forcing a pipeline refill during which time the register bank select logic settles).

    e.g. Assume processor starts in Supervisor mode in each case:-

    a) TEQP PC,#0

         MOV  R0,R0            SAFE: NOP added between mode change and access
         ADD  R0,R1,R13_usr          to a banked register (R13_usr).
    

    b) TEQP PC,#0

         ADD  R0,R1,R2         SAFE: No access made to a banked register
    

    c) TEQP PC,#0

         ADD  R0,R1,R13_usr    *FAILS*: Data NOT read from Register R13_usr!
    

    The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode changing instruction; this will guarantee correct operation regardless of the code sequence that follows it.

    2) LDM/STM: Forcing transfer of the user bank (Part 1)

    ################################################################### # Don't use write back when forcing user bank transfer in LDM/STM # ###################################################################
      # Applicability: ARM2,ARM3                                        #
    
    ###################################################################

    For STM instructions the S bit is redundant as the PSR is always stored with the PC whenever R15 is in the transfer list. In user mode programs the S bit is ignored, but in other modes it has a second interpretation. S=1 is used to force transfers to take values from the user register bank instead of the current register bank. This is useful for saving the user state on process switches.
    Similarly, in LDM instructions the S bit is redundant if R15 is not in the transfer list. In user mode programs, the S bit is ignored, but in non-user mode programs where R15 is not in the transfer list, S=1 is used to force loaded values to go to the user registers instead of the current register bank.
    In both cases where the processor is in a non-user mode and transfer to/from the user bank is forced by setting the S bit, write back of the base will also be to the user bank though the base will be fetched from the current bank. Therefore don't use write back when forcing user bank transfer in LDM/STM.

    e.g. In all cases, the processor is assumed to be in a non-user mode and

         <Rlist> is assumed not to include R15:-
    
         STMxx Rn!,<Rlist>   SAFE: Storing non-user registers with write back to
                                   the non-user base register
    
         LDMxx Rn!,<Rlist>   SAFE: Loading non-user registers with write back to
                                   the non-user base register
    
         STMxx Rn,<Rlist>^   SAFE: Storing user registers, but no base
                                   write-back
    
         STMxx Rn!,<Rlist>^  *FAILS*: Base fetched from non-user register, but
                                       written back into user register
    
         LDMxx Rn!,<Rlist>^  *FAILS*: Base fetched from non-user register, but
                                       written back into user register
    

    3) LDM: Forcing transfer of the user bank (Part 2)

    ###################################################################### # When loading user bank registers with an LDM in a non-user mode, # # care must be taken not to access a banked register (R8-R14) in the #
      # following instruction. Accesses to the unbanked registers          #
      # (R0-R7,R15) are safe.                                              #
    
    ######################################################################
      # Applicability: ARM2,ARM3                                           #
    
    ######################################################################

    Because the register bank switches from user mode to non-user mode during the first cycle of the instruction following an "LDM Rn,<Rlist>^", an attempt to access a banked register in that cycle may cause the wrong register to be accessed.

    e.g. In all cases, the processor is assumed to be in a non-user mode and

         <Rlist> is assumed not to include R15:-
    

    LDM Rn,<Rlist>^

       ADD R0,R1,R2             SAFE: Access to unbanked registers after LDM^
    

    LDM Rn,<Rlist>^

       MOV R0,R0                SAFE: NOP inserted before banked register used
       ADD R0,R1,R13_svc              following an LDM^
    

    LDM Rn,<Rlist>^

       ADD R0,R1,R13_svc        *FAILS*: Accessing a banked register immediately
                                         after an LDM^ returns the wrong data!
    

    ADR R14_svc, saveblock
    LDMIA R14_svc, {R0 - R14_usr}^
    LDR R14_svc, [R14_svc,#15*4] *FAILS*: Banked base register (R14_svc)

       MOVS   PC, R14_svc                        used immediately after the LDM^
    

    ADR R14_svc, saveblock
    LDMIA R14_svc, {R0 - R14_usr}^

       MOV   R0,R0                      SAFE: NOP inserted before banked
       LDR   R14_svc, [R14_svc,#15*4]         register (R14_svc) used
    
    MOVS PC, R14_svc NOTE:
    The ARM2 and ARM3 processors *usually* give the expected result, but cannot
    be guaranteed to do so under all circumstances. Therefore this code sequence should be avoided in future.

    4) SWI/Undefined Instruction trap interaction

    ###################################################################### # Care must be taken when writing an undefined instruction handler #
      # to allow for an unexpected call from a SWI instruction.            #
    
    # The erroneous SWI call should be intercepted and redirected to the #
      # software interrupt handler                                         #
    
    ######################################################################
      # Applicability: ARM2                                                #
    
    ######################################################################

    The implementation of the CDP instruction on ARM2 causes a Software Interrupt (SWI) to take the Undefined Instruction trap if the SWI was the next instruction after the CDP.
    e.g.
    SIN F0,F1

        SWI &11        *FAILS*: ARM2 will take the undefined instruction trap
                                instead of software interrupt trap.
    

    All Undefined Instruction handler code should check the failed instruction to see if it is a SWI, and if so pass it over to the software interrupt handler.

    5) Undefined instruction/Prefetch abort trap interaction

    ###################################################################### # Care must be taken when writing the Prefetch abort trap handler to #
      # allow for an unexpected call due to an undefined instruction       #
    
    ######################################################################
      # Applicability: ARM2,ARM3                                           #
    
    ######################################################################

    When an undefined instruction is fetched from the last word of a page, where the next page is absent from memory, the undefined instruction will cause the undefined instruction trap to be taken, and the following (aborted) instructions will cause a prefetch abort trap. One might expect the undefined instruction trap to be taken first, then the return to the succeeding code will cause the abort trap. In fact the prefetch abort has a higher priority than the undefined instruction trap, so the prefetch abort handler is entered _before_ the undefined instruction trap, indicating a fault at the address of the undefined instruction (which is in a page which is actually present). A normal return from the prefetch abort handler (after loading the absent page) will cause the undefined instruction to execute and take the trap correctly. However the indicated page is already present, so the prefetch abort handler may simply return control, causing an infinite loop to be entered.
    Therefore, the prefetch abort handler should check whether the indicated fault is in a page which is actually present. If so, the above condition must be present and so control should be passed to the undefined instruction handler. This will restore the expected sequential nature of the execution sequence; a normal return from the undefined instruction handler will cause the next instruction to be fetched (which will abort), the prefetch abort handler will be reentered (with an address pointing to the absent page), and execution can proceed normally.


    This section highlights some obscure cases of ARM operation which should be borne in mind when writing code.
    1. Use of R15


      • WARNING: When the PC is used as a destination, operand, base or shift *
      • register, different results will be obtained depending on *
      • the instruction and the exact usage of R15 *
      • Applicability: ARM2,ARM3 *
      Full details of the value derived from or written into R15+PSR for each instruction class is given in the datasheet. Care must be taken when using R15 because small changes in the instruction can yield significantly different results.

      e.g. Consider data operations of the type:-

                      <opcode>{cond}{S} Rd,Rn,Rm
                 or   <opcode>{cond}{S} Rd,Rn,Rm,<shiftname> Rs
      
      a) When R15 is used in the Rm position, it will give the value of the PC
             together with the PSR flags.
      
      b) When R15 is used in the Rn or Rs positions, it will give the value of
             the PC without the PSR flags (PSR bits replaced by zeros).
      

      MOV R0,#0
      ORR R1,R0,R15 ; R1:=PC+PSR (bits 31:26,1:0 reflect PSR flags)

          ORR R2,R15,R0   ; R2:=PC      (bits 31:26,1:0 set to zero)
      
      NOTE:
      The relevant instruction description in the ARM datasheets should be
      consulted for full details of the behaviour of R15.

      2) STM: Inclusion of the base in the register list


      • WARNING: In the case of a STM with writeback that includes the base *
      • register in the register list, the value of the base *
      • register stored depends upon its position in the register *
      • list *
      • Applicability: ARM2,ARM3 *
      During a STM, the first register is written out at the start of the second cycle of the instruction. When writeback is specified, the base is written back at the end of the second cycle. A STM which includes storing the base with the base as the first register to be stored will therefore store the unchanged value, whereas with the base second or later in the transfer order, it will store the modified value.

      e.g.
      MOV R5,#&1000
      STMIA R5!,{R5-R6} ; Stores value of R5=&1000

      MOV R5,#&1000
      STMIA R5!,{R4-R5} ; Stores value of R5=&1008

      3) MUL/MLA: Register restrictions


      • Given MUL Rd,Rm,Rs *
      • or MLA Rd,Rm,Rs,Rn *
      • *
      • Then Rd & Rm must be different registers *
      • Rd must not be R15 *
      • Applicability: ARM2,ARM3 *
      Due to the way that Booth's algorithm has been implemented, certain combinations of operand registers should be avoided. (The assembler will issue a warning if these restrictions are overlooked.) The destination register (Rd) should not be the same as the Rm operand register, as Rd is used to hold intermediate values and Rm is used repeatedly during the multiply. A MUL will give a zero result if Rm=Rd, and a MLA will give a meaningless result.
      The destination register (Rd) should also not be R15. R15 is protected from modification by these instructions, so the instruction will have no effect, except that it will put meaningless values in the PSR flags if the S bit is set.
      All other register combinations will give correct results, and Rd, Rn and Rs may use the same register when required.

      4) LDM/STM: Address Exceptions


      • WARNING: Illegal addresses formed during a LDM or STM operation will *
      • not cause an address exception *
      • Applicability: ARM2,ARM3 *
      Only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over- or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap.

      e.g. Assume processor is in a non-user mode & MEMC being accessed:-

             {these examples are very contrived}
      

      MOV R0,#&04000000 ; R0=&04000000 STMIA R0,{R1-R2} ; Address exception reported (base address illegal)

      MOV R0,#&04000000

          SUB R0,R0,#4       ; R0=&03FFFFFC
      
      STMIA R0,{R1-R2} ; No address exception reported (base address legal)
                             ; code will overwrite data at address &00000000
      
      NOTE:
      The exact behaviour of the system depends upon the memory manager to which
      the processor is attached; in some cases, the wraparound may be detected and the processor aborted.

      5) LDC/STC: Address Exceptions


      • WARNING: Illegal addresses formed during a LDC or STC operation will *
      • not cause an address exception (affects LDF/STF) *
      • Applicability: ARM2,ARM3 *
      The coprocessor data transfer operations act like STM and LDM with the processor generating the addresses and the coprocessor supplying/reading the data. As with LDM/STM, only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over- or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap. Note that the floating point LDF/STF instructions are forms of LDC & STC!

      e.g. Assume processor is in a non-user mode & MEMC being accessed:-

             {these examples are very contrived}
      

      MOV R0,#&04000000 ; R0=&04000000 STC CP1,CR0,[R0] ; Address exception reported (base address illegal)

      MOV R0,#&04000000

          SUB R0,R0,#4       ; R0=&03FFFFFC
          STFD F0,[R0]       ; No address exception reported (base address legal)
                             ; code will overwrite data at address &00000000
      
      NOTE:
      The exact behaviour of the system depends upon the memory manager to which
      the processor is attached; in some cases, the wraparound may be detected and the processor aborted.

      6) LDC: Data transfers to a coprocessor fetch more data than expected


      • Data to be transferred to a coprocessor with the LDC instruction should *
      • never be placed in the last word of an addressable chunk of memory, nor *
      • in the word of memory immediately preceding a read-sensitive memory *
      • location *
      • Applicability: ARM3 *
      Due to the pipelining introduced into the ARM3 coprocessor interface, an LDC operation will cause one extra word of data to be fetched from the internal cache or external memory by ARM3 and then discarded; if the extra data is fetched from an area of external memory marked as cacheable, a whole line of data will be fetched and placed in the cache. A particular case in point is that an LDC whose data ends at the last word of a memory page will load and then discard the first word (and hence the first cache line) of the next page. A minor effect of this is that it may occasionally cause an unnecessary page swap in a virtual memory system. The major effect of it is that (whether in a virtual memory system or not), the data for an LDC should never be placed in the last word of an addressable chunk of memory: the LDC will attempt to read the immediately following non-existent location and thus produce a memory fault.

      e.g. Assume processor is in a non-user mode, FPU hardware attached and MEMC

           being accessed:-
             {this example is very contrived}
      

      MOV R13,#&03000000 ; R13=Address of I/O space STFD F0,[R13,#-8]! ; Store F.P. register 0 at top of physical memory

                              ; (two words of data transferred)
      
      LDFD F1,[R13],#8 ; Load F.P. register 1 from top of physical memory
                              ; but THREE words of data are transferred, and the
                              ; third access will read from I/O space which may be
                              ; read sensitive!  *** BEWARE ***