Changes between Version 19 and Version 20 of Writing Rules/RISC


Ignore:
Timestamp:
Jan 14, 2008, 4:57:53 PM (16 years ago)
Author:
Nicolas Pouillon
Comment:

Cosmetic updates

Legend:

Unmodified
Added
Removed
Modified
  • Writing Rules/RISC

    v19 v20  
    2020On one hand, the same ISS is encapsulated in different wrappers to generate several simulation models, corresponding to several abstraction levels: CABA (Cycle-Accurate Bit-Accurate), TLM-T (Transaction Level Models with Time), and PV (Programmer View, untimed). On the other hand, it is possible to use the same wrapper for different types of processor architectures. As illustrated below, all simulation models can be obtained as the cartesian product of the ISS set, by the wrappers set.
    2121
    22 ||                                   || CABA Wrapper                || TLM-T Wrapper               || PV Wrapper        ||
    23 || ISS MIPSR3000        || CABA Model MIPS          || TLM-T Model MIPS          || PV Model MIPS  ||
    24 || ISS PPC405              || CABA Model PPC            || TLM-T Model PPC            || PV Model PPC   ||
    25 || ISS OpenRISC       || CABA Model OpenRISC || TLM-T Model OpenRISC || PV Model OpenRISC ||
     22||                      || CABA Wrapper          || TLM-T Wrapper         || PV Wrapper        ||
     23|| ISS MIPSR3000        || CABA Model MIPS       || TLM-T Model MIPS      || PV Model MIPS     ||
     24|| ISS PPC405           || CABA Model PPC        || TLM-T Model PPC       || PV Model PPC      ||
     25|| ISS OpenRISC         || CABA Model OpenRISC   || TLM-T Model OpenRISC  || PV Model OpenRISC ||
    2626
    2727The method has been demonstrated for the MIPSR3000 and PPC 405 processors, and can be simply extended to the OpenRISC, Sparc, Nios, and MicroBLAZE processors.
    2828
    29 This modeling approach supposes that all ISS implement the same generic API (Application Specific Interface), as this API must be independant from both the procesor architecture, and the wrapper type.
     29This modeling approach supposes that all ISS implement the same generic API (Application Programming Interface), as this API must be independant from both the procesor architecture, and the wrapper type.
    3030
    3131The proposed method makes the assumption that the processors use the '''VcIXcache''' cache controler available in the SoCLib library to interface the VCI interconnect. Such modular approach allows to share the modeling effort of the L1 cache controler. The functionnal validation and debug of this component has been a tedious task, and such reuse is probably a good policy. Nevertheless, a clean procedural interface has been defined between the processor core, and the cache controler, and the cache behaviour can be easily modified if required. 
    3232
    33 Finally this generic approach has been exploited to develop the gdbServer module that is mandatory to help the debug of the multi-tasks application software running on the MP-SoC architectures modeled with SoCLib. This tool can be used for all simulation models compliant with the method described below.
     33Finally this generic approach has been exploited to develop the GdbServer module that is mandatory to help the debugging of multi-task applications running on the MP-SoC architectures modeled with SoCLib. This tool can be used for all simulation models compliant with the method described below.
    3434
    3535= B) Generic ISS API =
    3636
    37 As explained in the introduction, the modeling method relies on a generic ISS API, usable by any 32 bits RISC processor, and by the three wrappers CABA, TLM-T & PV. The Instruction Set Simulator corresponding to a given processor handles a set of registers definning the processor internal state. The API described below defines a procedural interface to allows the various  wrappers  to access those registers. The main access function is the '''step()''' function, that executes one ISS step : For an untimed model (PV wrapper) one step corresponds to one instruction. For a timed model (CABA wrapper or TLM-T wrapper), one step corresponds to one cycle.
    38 
    39 
    40  * '''inline void reset()'''
    41 This function reset all registers defining the processor internal state.
     37As explained in the introduction, the modeling method relies on a generic ISS API, usable by any 32-bit RISC processor, and by the three wrappers CABA, TLM-T & PV. The Instruction Set Simulator corresponding to a given processor handles a set of registers definning the processor internal state. The API described below defines a procedural interface to allow the various  wrappers  to access those registers.
     38
     39Function '''step()''' is the main entry point, it executes one ISS step :
     40 * For an untimed model (PV wrapper) one step corresponds to one instruction.
     41 * For a timed model (CABA wrapper or TLM-T wrapper), one step corresponds to one cycle.
     42
     43
     44API:
     45
     46 * '''inline void reset()'''
     47
     48This function resets all registers defining the processor internal state.
    4249
    4350 * '''inline bool isBusy()'''
    44 This function is only used by timed wrappers (CABA & TLM-T). In RISC processors, most instructions have a visible latency of one cycle. But some instructions (such as multiplication or division) can have a visible latency larger than one cycle. This function is called by the CABA and TLM-T wrappers before executing one step : If the processor is busy, the wrapper calls the '''nullStep()''' function. If the processor is available, the wrapper may call the '''step()''' function to execute one instruction.
     51
     52This function is only used by timed wrappers (CABA & TLM-T). In RISC processors, most instructions have a visible latency of one cycle. But some instructions (such as multiplication or division) can have a visible latency longer than one cycle. This function is called by the CABA and TLM-T wrappers before executing one step : If the processor is busy, the wrapper calls the '''nullStep()''' function. If the processor is available, the wrapper may call the '''step()''' function to execute one instruction.
    4553
    4654 * '''inline void step()'''
     55
    4756This function executes one instruction. All processor internal registers can be modified.
    4857
    4958 * '''inline void nullStep()'''
     59
    5060This function performs one internal step of a long instruction.
    5161 
    5262 * '''inline void getInstructionRequest (bool & req , enum !InsAccessType & type, uint32_t & address)'''
    53 This function is used by the wrapper to obtain from the ISS the instruction request parameters. The '''req''' parameter is true when there is a valid request. The '''address''' parameter is the instruction address. The  '''type''' parameter can have the values defined below: 
     63
     64This function is used by the wrappers to obtain from the ISS the instruction request parameters. The '''req''' parameter is true when there is a valid request. The '''address''' parameter is the instruction address. The  '''type''' parameter can have the values defined below: 
    5465{{{
    5566enum InsAccessType {
    56     RC ,  // Read Instruction Cached
    57     RU ,  // Read Instruction Uncached
     67    RC,  // Read Instruction Cached
     68    RU,  // Read Instruction Uncached
    5869}
    5970}}}
    6071
    6172 * '''inline void getDataRequest (bool &req , enum !DataAccessType  & type, uint32_t & address, uint32_t & wdata)'''
     73
    6274This function is used by the wrapper to obtain from the ISS the data request parameters. The '''req''' parameter is true when there is a valid request. The '''address''' parameter is the data address, and the '''wdata''' parameter is the data value to be written. The  '''type''' parameter is  defined below :
    6375{{{
    6476enum DataAccessType {
    65     RW ,   // Read Word Cached
    66     RH ,  // Read Half Cached
    67     RB  ,  // Read Byte Cached
    68     RZ ,   // Cache Line Invalidate
    69     WW ,  // Write Word
    70     WH ,  // Write Half
    71     WB ,  // Write Byte
    72     SC ,  // Store Conditional Word
    73     LL , // Load Linked Word
     77    READ_WORD,   // Read Word
     78    READ_HALF,   // Read Half
     79    READ_BYTE,   // Read Byte
     80    LINE_INVAL,  // Cache Line Invalidate
     81    WRITE_WORD,  // Write Word
     82    WRITE_HALF,  // Write Half
     83    WRITE_BYTE,  // Write Byte
     84    STORE_COND,  // Store Conditional Word
     85    READ_LINKED, // Load Linked Word
    7486}
    7587}}}
    7688
    77  * '''inline void setInstruction (bool error, uint32_t ins)'''
     89 * '''inline void setInstruction (bool error, uint32_t ins)'''
     90
    7891This function is used by the wrapper to transmit to the ISS, the instruction to be executed ('''ins''' parameter). In case of exception (bus error), the '''error''' parameter is set.
    7992
    8093 * '''inline void setDataResponse (bool error, uint32_t rdata)'''
    81 This function is used by the wrapper to transmit to the ISS, the response to the data request. In case of a read request, the  '''rdata''' parameter contains the read value. In case of exception (bus error), the '''error''' parameter is set. In any case, this function must reset the ISS data request.
    82 
    83  * '''inline void setWriteBerr ()'''
     94
     95This function is used by the wrapper to transmit to the ISS, the response to the data request. In case of a read request, the  '''rdata''' parameter contains the read value. In case of exception (bus error), the '''error''' parameter is set.
     96
     97In any case, this function must reset the ISS data request.
     98
     99 * '''inline void setWriteBerr ()'''
     100
    84101This function is used by the wrapper to signal asynchronous bus errors, in case of a write acces, that is non blocking for the processor.
    85102
    86  * '''inline void setIrq (uint32_t irq)'''
    87 This function is used by the wrapper to signal the current value of the interrupt lines. For each processor, the number of interrupt lines must be defined by the ISS variable '''n_irq'''.
     103 * '''inline void setIrq (uint32_t irq)'''
     104
     105This function is used by the wrapper to signal the current value of the interrupt lines. For each processor, the number of interrupt lines must be defined by the ISS static variable '''n_irq'''.
    88106 
    89107 = C) ISS internal organisation =
    90108
    91 As an example, we present the general structure of the MIPS R3000 ISS (chronogram of figure 1). The instruction fetch, instruction decode, and instruction execution are done in one cycle. A specific register '''r_npc''' is introduced to model the delayed branch mechanism : the instruction following a branch instruction is always executed. The load instructions are executed in two cycles, as those instructions require two cache access (one for the instruction, one for the data). The ISS can issue two simultaneous request for the instruction cache, and the data cache, but those requests are done for different instructions.
     109As an example, we present the general structure of the MIPS-R3000 ISS (chronogram of figure 1). The instruction fetch, instruction decode, and instruction execution are done in one cycle.
     110
     111A specific register '''r_npc''' is introduced to model the delayed branch mechanism : the instruction following a branch instruction is always executed.
     112
     113The load instructions are executed in two cycles, as those instructions require two cache access (one for the instruction, one for the data).
     114The ISS can issue two simultaneous request for the instruction cache, and the data cache, but those requests are done for different instructions.
    92115
    93116[[Image(mips_iss.png, nolink)]]
    94117
    95 The '''r_pc''' et '''r_npc''' registers contain respectively the current instruction address, and the next instruction address. The wrapper can obtain the PC content using the '''getInstructionRequest()''' function, fetch the instruction in the cache (or in memory in case of MISS), and  propagate the requested intruction to the ISS using the '''setInstruction()''' function. The wrapper starts the instruction execution using the '''step()''' function. The general registers '''r_gp''', as well as the '''r_mem''' registers defining the possible data  access,  are modified. If, at the end of cycle (i) the '''r-mem''' registers contain a valid data access, this access will be performed during the next cycle, in parallel with the execution of instruction executed at cycle (i+1).
    96 
    97 From an implementation point of view, a specific ISS is implemented by a class '''processorIss'''. This class inherits the class '''genericIss''', that defines the prototypes of the access function presented in section B, (defined as virtual functions).
     118The '''r_pc''' and '''r_npc''' registers contain respectively the current instruction address, and the next instruction address.
     119The wrapper can obtain the PC content using the '''getInstructionRequest()''' function, fetch the instruction in the cache (or in memory in case of MISS), and  propagate the requested intruction to the ISS using the '''setInstruction()''' function.
     120
     121The wrapper starts the instruction execution using the '''step()''' function. The general registers '''r_gp''', as well as the '''r_mem''' registers defining the possible data  access,  are modified.
     122
     123At the end of cycle (i), if the '''r-mem''' registers contain a valid data access, this access will be performed during the next cycle, in parallel with the execution of instruction executed at cycle (i+1).
     124
     125From an implementation point of view, a specific ISS is implemented by a class '''processorIss'''. This class inherits the class '''soclib::common::Iss''', that defines the prototypes of the access function presented in section B (defined as pure virtual methods).
    98126
    99127= D) Generic cache controler =
    100128
    101 The hardware component '''!VciXcache''' is a generic cache controler, that can be used by various processor cores. It contains two separated instruction and data caches, but has a single VCI port to acces the VCI interconnect. The cache line width, and the cache size are defined as independant parameters for the data cache and the instruction cache.  On the processor side, the cache controler can receive two requests at each cycle : one instruction request (read only), and one data request (read or write). Those requests, and the corresponding responses are transmited through a normalised interface described below.
    102 Both instruction and data caches are blocking : the processor is supposed to be frozen in case of MISS (uncached read acces are handled as MISS). Both caches are direct mapping, and the write policy for the data cache is WRITE-THROUGH. The cache controler contains a write buffer supporting up to 8 fposted write requests. In case of successive write requests to contiguous addresses, the cache controler will build a single VCI burst. Therefore, the procesor can be blocked in case of MISS on a read request, but is generally not blocked in case of write request.
     129The hardware component '''!VciXcache''' is a generic cache controler that can be used by various processor cores.
     130
     131It contains separated instruction and data caches, but has a single VCI port to acces the VCI interconnect.
     132
     133The cache line width, and the cache size are defined as independant parameters for the data cache and the instruction cache.
     134
     135On the processor side, the cache controler can receive two requests at each cycle : one instruction request (read only), and one data request (read or write). Those requests, and the corresponding responses are transmited through a normalised interface described below.
     136
     137Both instruction and data caches are blocking : the processor is supposed to be frozen in case of MISS (uncached read acces are handled as MISS).
     138
     139Both caches are direct mapped, and the write policy for the data cache is WRITE-THROUGH. The cache controler contains a write buffer supporting up to 8 fposted write requests. In case of successive write requests to contiguous addresses, the cache controler will build a single VCI burst. Therefore, the procesor can be blocked in case of MISS on a read request, but is generally not blocked in case of write request.
     140
    103141Finally, in order to garanty a strong ordering memory consistency, the ‘’’VciXcache’’’ controler sequencialize the memory accesses, strictly respecting the access ordering defined by the processor on the '''!VciXcache''' interface. As the VCI interconnect does not garanty the in order delivery property, the cache controler waits the VCI response packet corresponding to transaction (n) before sending the VCI command packet corresponding to transaction (n+1).
    104142
     
    127165 = E) CABA modeling =
    128166
    129 The CABA modeling for a complete CPU (processor + cache) is presented in figure 2.
    130 The processor ISS is wrapped in the generic CABA wrapper, implemented by the class '''!IssWrapper'''..
    131 The class '''!IssWrapper''' contains the member variable '''m_iss''' representing the processor ISS. The type of the '''m_iss''' variable - defining the type of the
    132 wrapped processor - is specified by the template parameter '''iss_t'''. The class '''!IssWrapper''' inherit the class '''caba::!ModuleBase''', that is the basis for all CABA modules.
     167The CABA modeling for a complete CPU (processor + cache) is presented in figure 2.
     168
     169The processor ISS is wrapped in the generic CABA wrapper, implemented by the class '''!IssWrapper'''.
     170
     171The class '''!IssWrapper''' contains the member variable '''m_iss''' representing the processor ISS.
     172The type of the '''m_iss''' variable - defining the type of the wrapped processor - is specified by the template parameter '''iss_t'''.
     173The class '''!IssWrapper''' inherits the class '''caba::!BaseModule''', that is the basis for all CABA modules.
    133174
    134175[[Image(caba_wrapper.png, nolink)]]
    135176
    136 To communicate with the '''!VciXcache''', the '''!IssWrapper''' class contains two member variables '''p_icache''', of type '''!IcacheProcessorPort''' and '''p_dcache''', of type '''!DcacheProcessorPort'''. It contains also the member variable '''p_irq''', that is a pointer to an array of ports of type '''sc_in<bool>'''. This array represents the interrupt ports. The number N of interrupt ports depends on the wrapped processor, an is defined by the '''n_irq''' member variable of the '''iss_t''' class.
     177To communicate with the '''!VciXcache''', the '''!IssWrapper''' class contains two member variables '''p_icache''', of type '''!IcacheProcessorPort''' and '''p_dcache''', of type '''!DcacheProcessorPort'''.
     178It also contains the member variable '''p_irq''', that is a pointer to an array of ports of type '''sc_in<bool>'''.
     179This array represents the interrupt ports. The number N of interrupt ports depends on the wrapped processor, an is defined by the '''n_irq''' static member variable of the '''iss_t''' class.
    137180
    138181The SystemC code for the generic CABA wrapper is presented below :
     
    243286
    244287= F) TLM-T modeling =
    245 The TLM-T modeling for a complete CPU (processor + cache) is presented in figure 3.
    246 To increase the simulation speed, the TLM-T wrapper is the cache controller itself, and it is implemented as the class ''' !VciXcache'''. This class contains the SC_THREAD '''execLoop()''' implementing the PDES process, and the '''m_time''' member variable implementing the associated local clock. The class '''!VciXcache''' inherit the class '''tlmt::!ModuleBase''', that is the basis for all TLM-T modules.
     288
     289The TLM-T modeling for a complete CPU (processor + cache) is presented in figure 3.
     290
     291To increase the simulation speed, the TLM-T wrapper is the cache controller itself, and it is implemented as the class ''' !VciXcache'''. This class contains the SC_THREAD '''execLoop()''' implementing the PDES process, and the '''m_time''' member variable implementing the associated local clock.
     292
     293The class '''!VciXcache''' inherit the class '''tlmt::!ModuleBase''', that is the basis for all TLM-T modules.
     294
    247295This class contains the member variable '''m_iss''' representing the processor ISS. The type of the '''m_iss''' variable is defined by the template parameter '''iss_t'''.
    248296
     
    250298
    251299The class '''!VciXcache''' contain a member variable '''p_vci''', of type '''!VciInitPort''', to send VCI command packets, and receive VCI response packets.
    252 This class contains also the member variable '''p_irq''', that is a pointer to an array of ports of type '''SynchroInPort'''. This array represents the interrupt ports. The number N of interrupt ports depends on the wrapped processor, an is defined by the '''n_irq''' member variable of the '''iss_t''' class.
    253 
    254 The '''execLoop()''' function contains an infinite loop. One iteration in this loop corresponds to one cycle for the local clock, (or more, as the thread is suspended in case of MISS).
     300
     301This class also contains the member variable '''p_irq''', that is a pointer to an array of ports of type '''SynchroInPort'''. This array represents the interrupt ports. The number N of interrupt ports depends on the wrapped processor, an is defined by the '''n_irq''' member variable of the '''iss_t''' class.
     302
     303The '''execLoop()''' function contains an infinite loop. One iteration in this loop corresponds to one cycle for the local clock (or more, as the thread is suspended in case of MISS).
    255304
    256305The cache behavior is specifically described by the '''cacheAccess()''' method, that is a member function of the class '''!VciXcache''', and is called by '''execLoop()'''  at each cycle. This function has the following prototype :
     
    264313{{{
    265314class icache_request_t {
    266 bool  valid ;
    267 enum InsAccessType  type ;
    268 uint32_t  address ;
    269 }
     315   bool  valid ;
     316   enum InsAccessType  type ;
     317   uint32_t  address ;
     318};
    270319class dcache_request_t {
    271 bool  valid ;
    272 enum DataAccessType  type ;
    273 uint32_t  address ;
    274 uint32_t  wdata ;
    275 }
     320   bool  valid ;
     321   enum DataAccessType  type ;
     322   uint32_t  address ;
     323   uint32_t  wdata ;
     324};
    276325class xcache_response_t {
    277 bool  iber ;
    278 uint32_t  instruction ;
    279 bool  dber ;
    280 uint32_t  rdata ;
    281 }
    282 }}}
     326   bool  iber ;
     327   uint32_t  instruction ;
     328   bool  dber ;
     329   uint32_t  rdata ;
     330};
     331}}}
     332
    283333The '''cacheAccess()''' function détermines the actions to be done :
    284334 * In case of data or instruction MISS, the '''cacheAccess()''' function sends the proper VCI command packet on the '''p_vci''' port, and the '''exedcLoop()''' thread is suspended.