GPU/CUDA:Maximum number of blocks of a grid and Maximum number of resident blocks per multiprocessor -
my gpu of capability 2.1, 2 sms, , each sm has 48 cores. according technical specifications provided in cuda-c programming guide, maximum number of blocks of grid 65535, , maximum number of resident blocks per multiprocessor 8.
i confused how blocks can launch. if maximum of blocks per sm 8, doesn't mean launch @ 16 blocks if there 2 sms? launched more blocks.
maybe there such things active blocks , inactive blocks? if fact how these blocks scheduled? inactive waits till 8 active blocks finished? brings synchronization problems...
some more questions...if there 48 cores on each sm, there can 3 half-warps executing @ same time. shared memory has 32 banks. if 2 threads try read same band concurrently, won't produce bankconflict if belong different half-warp?
according technical specifications provided in cuda-c programming guide, maximum number of blocks of grid 65535, , maximum number of resident blocks per multiprocessor 8.
i confused how blocks can launch. if maximum of blocks per sm 8, doesn't mean launch @ 16 blocks if there 2 sms?
the maximum number of blocks (per dimension in grid) limitation on cuda scheduler can handle. except recent kepler gpus, limitation 65535 along each d imension.
practically number of active blocks dependent on lot of things. there hard limitation on number of blocks each sm can launch, number can smaller if use large amounts of shared memory, registers or threads per block.
the scheduler switches out inactive blocks (i.e. blocks stalling various reasons) , switches in active ones. large number of blocks launched physically possible keep sms active possible.
but brings synchronization problems...
never assume cuda blocks launched in order. can processed out of order , synchronization point finish kernel , cudadevicesynchronize
on host.
Comments
Post a Comment