Changes between Initial Version and Version 1 of Writing Rules/RISC


Ignore:
Timestamp:
Jan 7, 2008, 1:02:03 PM (16 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Writing Rules/RISC

    v1 v1  
     1{{{
     2#!html
     3<h1>A  general method for SystemC modeling of RISC processors</h1>
     4}}}
     5
     6Authors : Alain Greiner, François Pécheux, Nicolas Pouillon
     7 
     8[[PageOutline]]
     9
     10
     11The goal of the method presented here is to simplify the SystemC modeling of a specific class of embedded processors : The method is well suited to 32 bits RISC processors, with one single instruction issue per cycle, and blocking instruction and data caches.
     12
     13= A) General principles =
     14
     15The method relies on three basic principles :
     16• The processor core is modeled as a generic ISS (Instruction Set Simulator).
     17• This ISS is wrapped in apropriate wrappers for several types of simulation models : CABA, TLM-T and PV.
     18• All processors types use the same generic cache controler.
     19
     20On one hand, the same ISS is encapsulated in different wrappers to generate several simulation models, corresponding to several abstraction levels: CABA (Cycle-Accurate Bit-Accurate), TLM-T (Transaction Level Models with Time), and PV (Programmer View, untimed). On the other hand, it is possible to use the same wrapper for different types of processor architectures. As illustrated below, all simulation models can be obtained as the cartesian product of the ISS set, by the wrappers set.
     21
     22||                                   || CABA Wrapper                || TLM-T Wrapper               || PV Wrapper        ||
     23|| ISS MIPSR3000        || CABA Model MIPS          || TLM-T Model MIPS          || PV Model MIPS  ||
     24|| ISS PPC405              || CABA Model PPC            || TLM-T Model PPC            || PV Model PPC   ||
     25|| ISS OpenRISC       || CABA Model OpenRISC || TLM-T Model OpenRISC || PV Model MIPS  ||
     26
     27The method has been demonstrated for the MIPSR3000 and PPC 405 processors, and can be simply extended to the OpenRISC, Sparc, Nios, and MicroBLAZE processors.
     28
     29This modeling approach supposes that all ISS implement the same generic API (Application Specific Interface), as this API must be independant from both the procesor architecture, and the wrapper type.
     30
     31The proposed method makes the assumption that the processors use the ‘’’VcIXcache’’’ cache controler available in the SoCLib library to interface the VCI interconnect. Such modular approach allows to share the modeling effort of the L1 cache controler. The functionnal validation and debug of this component has been a tedious task, and such reuse is probably a good policy. Nevertheless, a clean procedural interface has been defined between the processor core, and the cache controler, and the cache behaviour can be easily modified if required. 
     32
     33Finally this generic approach has been exploited to develop the gdbServer module that is mandatory to help the debug of the multi-tasks application software running on the MP-SoC architectures modeled with SoCLib. This tool can be used for all simulation models compliant with the method described below.
     34
     35= B) Generic ISS API =
     36
     37As explained in the introduction, the modeling method relies on a generic ISS API, usable by any 32 bits RISC processor, and by the three wrappers CABA, TLM-T & PV. The Instruction Set Simulator corresponding to a given processor handles a set of registers definning the processor internal state. The API described below defines a procedural interface to allows the various  wrappers  to access those registers. The main access function is the ‘’’step()’’’ function, that executes one ISS step : For an untimed model (PV wrapper) one step corresponds to one instruction. For a timed model (CABA wrapper or TLM-T wrapper), one step corresponds to one cycle.
     38
     39
     40* '''inline void reset()'''
     41This function reset all registers defining the processor internal state.
     42
     43 * '''inline bool isBusy()'''
     44This function is only used by timed wrappers (CABA & TLM-T). In RISC processors, most instructions have a visible latency of one cycle. But some instructions (such as multiplication or division) can have a visible latency larger than one cycle. This function is called by the CABA and TLM-T wrappers before executing one step : If the processor is busy, the wrapper calls the ‘’’nullStep()’’’ function. If the processor is available, the wrapper may call the ‘’’step()’’’ function to execute one instruction.
     45
     46 * '''inline void step()'''
     47This function executes one instruction. All processor internal registers can be modified.
     48
     49 * '''inline void nullStep()'''
     50This function performs one internal step of a long instruction.
     51 
     52 * '''inline void getInstructionRequest (bool & req , enum InsAccessType        & type, uint32_t & address)'''
     53This function is used by the wrapper to obtain from the ISS the instruction request parameters. The ‘’’req’’’ parameter is true when there is a valid request. The ‘’’address’’’ parameter is the instruction address. The  ‘’’type’’’ parameter can have the values defined below: 
     54{{{
     55enum InsAccessType {
     56    RC ,  // Read Instruction Cached
     57    RU ,  // Read Instruction Uncached
     58}
     59}}}
     60
     61 * '''inline void getDataRequest (bool &req , enum DataAccessType       & type, uint32_t & address, uint32_t & wdata)'''
     62This function is used by the wrapper to obtain from the ISS the data request parameters. The ‘’’req’’’ parameter is true when there is a valid request. The ‘’’address’’’ parameter is the data address, and the ‘’’wdata’’’ parameter is the data value to be written. The  ‘’’type’’’ parameter is  defined below :
     63{{{
     64enum DataAccessType {
     65RW ,   // Read Word Cached
     66RH ,  // Read Half Cached
     67RB  ,  // Read Byte Cached
     68RZ ,   // Cache Line Invalidate
     69RWU ,  // Read Word Uncached
     70RHU ,  // Read Half Uncached
     71RBU ,  // Read Byte Uncached
     72WW ,  // Write Word
     73WH ,  // Write Half
     74WB ,  // Write Byte
     75SC ,  // Store Conditional word
     76LL ,  // Load Linked word
     77}
     78}}}
     79
     80 * '''inline void setInstruction (bool error, uint32_t ins)'''
     81This function is used by the wrapper to transmit to the ISS, the instruction to be executed (‘’’ins’’’ parameter). In case of exception (bus error), the ‘’’error’’’ parameter is set.
     82
     83 * '''inline void setRdata (bool error, uint32_t rdata)'''
     84This function is used by the wrapper to transmit to the ISS, the read data (‘’’rdata’’’ parameter). In case of exception (bus error), the ‘’’error’’’ parameter is set.
     85
     86 * '''inline void setWriteBerr ()'''
     87This function is used by the wrapper to signal asynchronous bus errors, in case of a write acces, that is non blocking for the processor.
     88
     89 * '''inline void setIrq (uint32_t irq)'''
     90This function is used by the wrapper to signal the current value of the interrupt lines. For each processor, the number of interrupt lines must be defined by the ISS variable ‘’’n_irq’’’.
     91 
     92 = C) ISS internal organisation =
     93
     94As an example, we present the general structure of the MIPS R3000 ISS, as depicted in the chronogram of figure 2. The instruction fetch, instruction decode, and instruction execution are done in one cycle. A specific register ‘’’r_npc’’’ is introduced to model the delayed branch mechanism : the instruction following a branch instruction is always executed. The load instructions are executed in two cycles, as those instructions require two cache access (one for the instruction, one for the data). The ISS can issue two simultaneous request for the instruction cache, and the data cache, but those requests are done for different instructions.
     95
     96
     97
     98
     99
     100
     101
     102
     103
     104
     105
     106
     107
     108
     109
     110
     111
     112
     113
     114
     115
     116
     117
     118
     119
     120
     121
     122FIGURE 2
     123
     124
     125The ‘’’r_pc’’’ et ‘’’r_npc’’’ registers contain respectively the current instruction address, and the next instruction address. The wrapper can obtain the PC content using the ‘’’ getInstructionRequest()’’’ function, fetch the instruction in the cache (or in memory in case of MISS), and  propagate the requested intruction to the ISS using the ‘’’setInstruction()’’’ function. The wrapper can then start the instruction execution using the ‘’’step()’’’ function. The general registers ‘’’r_gp’’’, as well as the ‘’’r_mem’’’ registers defining the possible data  access, and the ‘’’r_pc’’’ & ‘’’r_npc’’’ will be modified. If, at the end of cycle (i) the ‘’’r-mem’’’ register contain a valid data access, this access will be performed during the next cycle, in parallel with the execution of instruction (i+1).
     126
     127
     128From an implementation point of view, a specific ISS is implemented by a class ‘’’processorIss’’’. This class inherits the class ‘’’genericIss’’’, that defines the characteristics common to all ISS, including the prototypes of the access function presented in section B. Those functions are defined as virtual functions in the class ‘’’genericIss’’’.
     129
     130= D) Generic cache controler =
     131
     132The hardware component ‘’’VciXcache’’’ is a generic cache controler, that can be used by various processor cores. It contains two separated instruction and data caches, but has a single VCI port to acces the VCI interconnect. The cache line width, and the cache size are defined as independant parameters for the data cache and the instruction cache.  On the processor side, the cache controler can receive two requests at each cycle : one instruction request (read only), and one data request (read or write). Those requests, and the corresponding responses are transmited through a normalised interface described below.
     133Both instruction and data caches are blocking : the processor is supposed to be frozen in case of MISS (uncached read acces are handled as MISS). Both caches are direct mapping, and the write policy for the data cache is WRITE-THROUGH. The cache controler contains a write buffer supporting up to 8 fposted write requests. In case of successive write requests to contiguous addresses, the cache controler will build a single VCI burst. Therefore, the procesor can be blocked in case of MISS on a read request, but is generally not blocked in case of write request.
     134Finally, in order to garanty a strong ordering memory consistency, the ‘’’VciXcache’’’ controler sequencialize the memory accesses, strictly respecting the access ordering defined by the processor on the ‘’’VciXcache’’’ interface. As the VCI interconnect does not garanty the in order delivery property, the cache controler waits the VCI response packet corresponding to transaction (n) before sending the VCI command packet corresponding to transaction (n+1).
     135
     136To communicate with the processor, the CABA model of the ‘’’VciXcache’’’ component contains two ports  defined below :
     137
     138class IcacheCachePort {
     139sc_in<bool>  req;  // valid request
     140sc_in<sc_dt::sc_uint<2>  >  type ;  // instruction access type
     141sc_in<sc_dt::sc_uint<2> > mode;  // processor mode 
     142sc_in<sc_dt::sc_uint<32> >  adr;  // instruction address
     143sc_out<bool>  frz ;  // frozen processor
     144sc_out<sc_dt::sc_uint<32> >  ins;  // instruction
     145sc_out<bool>  berr;  //  bus error
     146}
     147
     148class DcacheCachePort {
     149sc_in<bool>  req;  // valid request
     150sc_in<sc_dt::sc_uint<4>  >  type ;  // data access type
     151sc_in<sc_dt::sc_uint<2> > mode;  // processor mode 
     152
     153sc_in<sc_dt::sc_uint<32> >  wdata;  // data to be written
     154sc_in<sc_dt::sc_uint<32> >  adr;  // data address
     155sc_out<bool>  frz ;  // frozen processor
     156sc_out<sc_dt::sc_uint<32> >  rdata;  // read data
     157sc_out<bool>  berr;  // bus error
     158}
     159
     160The possible values for the data access ‘’’type’’’ are defined below :
     161
     162enum data_access_type {
     163RW ,   // Read Word Cached
     164RH ,  // Read Half Cached
     165RB  ,  // Read Byte Cached
     166RZ ,   // Cache Line Invalidate
     167RWU ,  // Read Word Uncached
     168RHU ,  // Read Half Uncached
     169RBU ,  // Read Byte Uncached
     170WW ,  // Write Word
     171WH ,  // Write Half
     172WB ,  // Write Byte
     173SC ,  // Store Conditional word
     174LL ,  // Load Linked word
     175}
     176
     177The possible values for the instruction access ‘’’type’’’ are defined below :
     178
     179enum ins_access_type {
     180RC ,  // Read Instruction Cached
     181RU ,  // Read Instruction Uncached
     182}
     183
     184The mode parameter, defining the processor mode, is used for access right checking. It is not used by the ‘’’VciXcache’’’ component , but this parameter is mandatory for a cache controler supporting a page based protection scheme. Due to the pipe-lined structure of the processor, the ‘’’mode’’’ parameter can have different values for instruction and data requests. The possible values are defined below :
     185
     186enum access_mode {
     187USER ,         
     188KERNEL ,                                       
     189HYPER ,
     190}
     191 = E) CABA modeling =
     192
     193The CABA modeling for a complete CPU (processor + cache) is presented in figure 3.
     194The processor ISS is wrapped in the generic CABA wrapper, implemented by the class ‘’’ !IssWrapper’’’.
     195This class inherit the class ‘’’ !Caba::ModuleBase’’’, that is the basis for all CABA modules.
     196The class ‘’’ !IssWrapper’’’ contains the member variable ‘’’iss’’’ representing the processor ISS. The type of the ‘’’iss’’’ variable is defined by the template parameter ‘’’iss_t’’’.
     197
     198
     199
     200
     201
     202
     203
     204
     205
     206
     207FIGURE 3
     208
     209To communicate with the ‘’’ !VciXcache’’’, the ‘’’ !IssWrapper’’’ class contains two member variables ‘’’p_icache’’’, of type ‘’’ !IcacheProcessorPort’’’ and ‘’’p_dcache, of type ‘’’ !DcacheProcessorPort’’’.
     210
     211This class contains also N member variables ‘’’p_irq[i]’’’, of type ‘’’sc_in<bool>’’’, representing the interrupt ports. The number N of interrupt ports depends on the wrapped ISS, an is defined by the ‘’’n_irq’’’ member variable of the ‘’’iss’’’ object.
     212
     213The CABA wrapper is presented below :
     214
     215{{{
     216template<typename iss_t>
     217class IssWrapper : Caba::BaseModule
     218{
     219
     220public :
     221
     222//////// ports ////////
     223sc_in<bool>  p_irq[iss_t ::n_irq] ;
     224IcacheProcessorPort  p_icache ;
     225DcacheProcessorPort  p_dcache ;
     226sc_in<bool>  p_resetn ;
     227sc_in<bool>  p_clk ;
     228
     229///////// constructor ///////////
     230IssWrapper(sc_module_name insname,
     231         int    ident ) :
     232    BaseModule(insname),
     233    p_icache("icache"),
     234    p_dcache("dcache"),
     235    p_resetn("resetn"),
     236    p_clk("clk"),
     237    m_iss(ident)
     238    {
     239    SC_METHOD(transition);
     240    dont_initialize();
     241    sensitive << p_clk.pos();
     242    SC_METHOD(genMoore);
     243    dont_initialize();
     244    sensitive << p_clk.neg();
     245    }
     246
     247private :
     248
     249///////// Variables /////////
     250iss_t  m_iss ;
     251bool  m_ins_asked ;
     252enum ins_access_type  m_ins_type ;
     253enum access_mode  m_ins_mode ;
     254uint32_t  m_ins_addr ;
     255bool  m_mem_asked ;
     256enum data_access_type  m_mem_type ;
     257enum access_mode  m_mem_mode ;
     258uint32_t  m_mem_addr ;
     259uint32_t  m_mem_wdata ;
     260
     261/////////////////////////
     262void transition()
     263    {
     264    if ( ! p_resetn.read() ) {
     265        m_iss.reset();   
     266        return;
     267    }
     268    bool frozen = false;
     269    m_iss.getDataRequest(m_mem_asked,
     270                m_mem_type,
     271                m_mem_mode,
     272                m_mem_addr,
     273                m_mem_wdata );
     274    m_iss.getInstructionRequest(m_ins_asked,
     275                m_ins_type,
     276                m_ins_mode,
     277                m_ins_addr );
     278    if ( m_ins_asked ) {
     279        if ( p_icache.frz.read() ) frozen = true;
     280        else m_iss.setInstruction(p_icache.berr, p_icache.ins.read())
     281    }
     282    if ( m_mem_asked ) {
     283        if ( p_dcache.frz.read()) frozen = true;
     284        else  m_iss.setRdata(false, p_dcache.rdata.read());
     285        }
     286    if ( frozen || m_iss.isBusy() ) {   //  Processor frozen or busy
     287        m_iss.nullStep();
     288    } else {                    // Execute one instruction:
     289        uint32_t irqword = 0;
     290        for ( size_t i=0; i<(size_t)iss_t::n_irq; i++ )  { if (p_irq[i].read()) irqword |= (1<<i); }
     291        m_iss.setIrq(irqword);
     292        m_iss.step();
     293        } // end transition()
     294
     295//////////////////////////////
     296void genMoore()
     297{
     298p_icache.req = m_ins_asked;
     299p_icache.type = m_ins_type ;
     300p_icache.mode = m_ins_mode ;
     301p_icache.adr = m_ins_addr;
     302p_dcache_req = m_mem_asked ;
     303p_dcache_type = m_mem_type ;
     304p_dcache_mode = m_mem_mode ;
     305p_dcache.adr = m_mem_addr;
     306p_dcache.wdata = m_mem_wdata; 
     307} // end genMoore
     308}}}
     309
     310= F) TLM-T modeling =
     311
     312The TLM-T modeling for a complete CPU (processor + cache) is presented in figure 4.
     313To increase the simulation speed, the TLM-T wrapper is the cache controler itself, and it is implemented as the class ‘’’ !VciXcache’’’. This class contains the SC_THREAD ‘’’execLoop()’’’’ implementing the PDES process, and the ‘’’m_time’’’ member variable implementing the associated local clock. The class ‘’’ !VciXcache’’’ inherit the class ‘’’ !Tlmt::ModuleBase’’’, that is the basis for all TLM-T modules.
     314This class contains the member variable ‘’’iss’’’ representing the processor ISS. The type of the ‘’’iss’’’ variable is defined by the template parameter ‘’’iss_t’’’.
     315
     316
     317
     318
     319
     320
     321
     322
     323
     324
     325FIGURE 4
     326
     327The class ‘’’ !VciXcache’’’ contain a member variable p_vci, of type ‘’’!VciInitPort’’’, to send VCI command packets, and receive VCI response packets.
     328This class contains also N member variables ‘’’p_irq[i]’’’, of type ‘’’ !IrqInPort’’’, representing the interrupt inputs. The number N of interrupt ports depends on the wrapped ISS, an is defined by the ‘’’n_irq’’’ member variable of the ‘’’iss’’’ object.
     329
     330The ‘’’execLoop()’’’ function contains an infinite loop. One iteration in this loop corresponds to one cycle for the local clock, or more in case of MISS, as the thread is suspended in case of MISS.
     331
     332The cache behavior is specifically described by the ‘’’cacheAccess()’’’ method, that is a member variable of the class ‘’’ !VciXcache’’’. This function is called in the main execution loop (i.e. at each cycle). This function has the following prototype :
     333{{{
     334void cacheAccess(icache_request_t  *ireq,
     335                dcache_request_t  *dreq,
     336                xcache_response_t  *rsp)
     337}}}
     338
     339The ‘’’icache_request_t’’’, ‘’’dcache_request_t’’’, and ‘’’xcache_response_t’’’ classes represent the instruction and data requests, and the cache response respectively :
     340{{{
     341class icache_request_t {
     342bool  valid ;
     343enum ins_access_type  type ;
     344enum access_mode  mode ;
     345uint32_t  address ;
     346}
     347class dcache_request_t {
     348bool  valid ;
     349enum data_access_type  type ;
     350enum access_mode  mode ;
     351uint32_t  address ;
     352uint32_t  wdata ;
     353}
     354class xcache_response_t {
     355bool  iber ;
     356uint32_t  instruction ;
     357bool  dber ;
     358uint32_t  rdata ;
     359}
     360}}}
     361The ‘’’cacheAccess()‘’’ function détermines the actions to be done.
     362In case of data or instruction MISS MISS, the ‘’’cacheAccess()’’’ function send the proper VCI command packet on the ‘’’p_vci’’’ port., and the ‘’’exedcLoop’’’ thread is suspended.
     363In case of data write, the the ‘’’cacheAccess()’’’ function send the proper VCI command packet on the ‘’’p_vci’’’ port., but the ‘’’exedcLoop’’’ thread is not suspended.
     364
     365At each iteration in the execution loop, the ‘’’cacheAccess()’’’ method updates the local clock (variable ‘’’m_time’’’) :
     366• The local time is simply incremented by one cycle, if the cache controller is able to answer immediately to the processor requests.
     367• The local time is updated using the date containeded in the VCI response packet in case of MISS
     368
     369The TLM-T model for the VciXcache module is presented below :
     370{{{
     371template<typename iss_t, typename vci_param>
     372class VciXcache<iss_t> : tlmt ::BaseModule {
     373
     374public :
     375/////// ports ///////
     376VciInitiatorPort<vci_param>  p_vci ;
     377IrqInPort  p_irq[iss_t ::n_irq] ;
     378
     379/////// constructor /////
     380VciXcache (sc_module_name  name,
     381          uint32_t  initiatorIndex,
     382          uint32_t  processorIdent,
     383          uint32_t  lookahead,
     384          uint32_t  dcache_nlines,
     385          uint32_t  dcache_nwords,
     386          uint32_t  icache_nlines,
     387          uint32_t  icache_nwords)
     388    p_vci(« vci », this, &VciXcache::rspReceived, &m_time) ,
     389    for (uint32_t i = 0 ; i < iss_t ::n_irq ; i++) { p_irq[i] (« irq », i, this, &VciXcache::irqReceived) ; }
     390    BaseModule(name),
     391    m_iss(processorIdent),
     392    m_time(0)
     393    {
     394    m_initiator_index = initiatorIndex ;
     395    m_counter = 0 ;
     396    m_lookahead = lookahead ;
     397    m_icache_nlines = icache_nlines ;   
     398    mi_icache_nwords = icache_nwords ;
     399    m_dcache_nlines = dcache_nlines ;
     400    m_dcache_nwords = dcache_nwords ;
     401    SC_THREAD(execLoop) ;
     402    } // end constructor
     403
     404private :
     405///////  member variables
     406tlmt_time  m_time ;
     407iss_t  m_iss ;
     408uint32_t  m_dcache_nlines ;
     409uint32_t  m_dcache_nwords ;
     410uint32_t  m_icache_nlines ;
     411uint32_t  m_icache_nwords ;
     412uint32_t  m_initiator_index ;
     413uint32_t         m_lookahead ;
     414uint32_t  m_counter ;
     415bool  m_irqpending[iss_t ;;n_irq];
     416uint32_t        m_irqtime[iss_t ::n_irq] ;
     417vci_cmd_t m_cmd ;
     418////////////////// thread
     419void execLoop()
     420    {
     421    icache_request_t  icache_req ;              // The Icache request
     422    dcache_request_t  dcache_req ;              // The Dcache request
     423    xcache_response_t  xcache_rsp ;     // The Xcache response
     424    uint32_t  irqword ;
     425    while(1) {
     426        // execute one cycle
     427        if (m_iss.isBusy() {
     428            m_iss.nullStep() ;
     429        } else {
     430            m_iss.getInstructionRequest(icache_req.valid,
     431                          icache_req.type,
     432                          icache_req.mode,
     433                          icache_req.address) ;
     434            m_iss.getDataRequest(dcache_req.valid,
     435                          dcache_req.type,
     436                          dcache_req.mode,
     437                          dcache_req.address,
     438                          dcache_req.wdata)
     439            xcacheAccess(&icache_req, &dcache_req, &xcache_rsp) ;
     440            m_iss.setInstruction(xcache_rsp.iber, xcache_rsp.instruction) ;
     441            if(dcache_req.isRead()) m_iss.setRdata(xcache_rsp.dber, xcache_rsp.rdata) ;
     442            irqword = 0 ;
     443            for ( size_t i = 0 ; i < iss_t ::n_irq ; i++) {
     444                if( m_irqpending[i] && m_irqtime[i] <= get_time()) irqword  |= (1<<i);
     445               }
     446            m_iss.setIrq(irqword) ;     
     447            m_iss.step() ;
     448        } // end cycle
     449        // lookahead management
     450        m_counter++ ;
     451        if (m_counter >= m_lookahead) {
     452            m_counter = 0 ;
     453             wait(SC_ZERO_TIME) ;
     454        } // end if lookahead
     455        } // end while(1)
     456    } // end execLoop()         
     457               
     458/////////////////////////////////////////////////////
     459void cacheAccess(icache_request_t ireq,
     460                dcache_request_t dreq,
     461                xcache_response_t rsp)
     462    {
     463    } // end cacheAccess()
     464
     465
     466////////////////////////////
     467void  rspReceived(vci_rsp_t rsp,
     468              uint32_t  time)
     469    {
     470    } // end rspReceived()
     471
     472////////////////////////////
     473void  irqReceived(bool val,
     474              uint32_t  time
     475              size_t index)
     476    {
     477    m_irqpending[index] = val ;
     478    m_irqtime[p_irq[index] = time ;
     479    } // end irqReceived()
     480
     481       
     482
     483
     484