Nvidia GeForce GTX 980 White Paper Download | Manualshive

Page: 8 / 32

background image

GeForce GTX 980 Whitepaper

GM204 HARDWARE ARCHITECTURE

IN-DEPTH

8

from 32 to 64. Again, thanks to the added benefit of higher clocks, pixel fill-rate is actually more than
double that of GTX 680: 72 Gpixels/sec for GTX 980 versus 32.2 Gpixels/sec for GTX 680.

The memory subsystem has also been significantly revamped. GTX 980’s memory clock is over 15%
higher than GTX 680, and GM204’s cache is larger and more efficient than Kepler’s design, reducing the
number of memory requests that have to be made to DRAM. Improvements in our implementation of
memory compression provide a further benefit in reducing DRAM traffic—effectively amplifying the raw
DRAM bandwidth in the system.

Maxwell Streaming Multiprocessor

The SM is the heart of our GPUs. Almost
every operation flows through the SM at
some point in the rendering pipeline.
Maxwell GPUs feature a new SM that’s
been designed to provide dramatically
improved performance per watt than prior
GeForce GPUs.

Compared to GPUs based on our Kepler
architecture, Maxwell’s new SMM design
has been reconfigured to improve
efficiency. Each SMM contains four warp
schedulers, and each warp scheduler is
capable of dispatching two instructions per
warp every clock.

Compared to Kepler’s

scheduling logic, we’ve integrated a
number of improvements in the scheduler
to further reduce redundant re-
computation of scheduling decisions,
improving energy efficiency. We’ve also
integrated a completely new datapath
organization. Whereas Kepler’s SM shipped
with 192 CUDA Cores—a non-power-of-two
organization—the Maxwell SMM is
partitioned into four distinct 32-CUDA core
processing blocks (128 CUDA cores total
per SM), each with its own dedicated
resources for scheduling and instruction
buffering. This new configuration in
Maxwell aligns with warp size, making it
easier to utilize efficiently and saving area

Figure 3: GM204 SMM Diagram (GM204 also features 4 DP units per

SMM, which are not depicted on this diagram)

«
...
6
7
8
9
10
...
»

Summary of Contents for GeForce GTX 980

Page 1: ...Whitepaper NVIDIA GeForce GTX 980 Featuring Maxwell The Most Advanced GPU Ever Made V1 1 ...

Page 2: ...re In Depth 6 Maxwell Streaming Multiprocessor 8 PolyMorph Engine 3 0 9 GM204 Memory Subsystem 10 New Display and Video Engines 11 Maxwell Enabling The Next Frontier in PC Graphics 13 Hardware Acceleration for VXGI Multi Projection and Conservative Raster 21 Tiled Resources 23 Raster Ordered View 24 DirectX 12 25 Advancing the State Of The Art in Image Quality 27 Dynamic Super Resolution 29 Conclu...

Page 3: ...re ideal for use in power limited environments like notebooks and small form factor PCs in addition to mainstream desktops NVIDIA s latest GPU GM204 is the first to use the full realization of our 10th generation GPU architecture Maxwell Our design goals for GM204 were to deliver Extraordinary Gaming Performance for the Latest Displays Incredible Energy Efficiency Dramatic Leap Forward In Lighting...

Page 4: ...p PC gaming market has grown explosively in the past few years The Maxwell architecture was designed to provide an extraordinary leap in power efficiency and deliver unrivaled performance while simultaneously reducing power consumption from the previous generation With a combination of advances originally developed for Tegra K1 new architectural approaches seen first in the GeForce GTX 750 Ti and ...

Page 5: ... rendering stage to accurately determine the effect of light bouncing around in the scene Cyril s original implementation relied on voxels that were stored in an octree structure While it was able to run successfully on a GeForce GTX 680 it had limitations We ve spent the last three years developing an implementation that can be accelerated natively by the GPU as well as improving the algorithm Th...

Page 6: ...6 Maxwell SMs SMM and four memory controllers GeForce GTX 980 uses the full complement of these architectural components if you are not well versed in these structures we suggest you first read the Kepler and Fermi whitepapers Another version of the chip with 13 SMs will ship concurrently and be called GeForce GTX 970 In the future we plan to offer additional products based on GM204 that will ship...

Page 7: ...ache Size 512KB 2048KB TDP 195 Watts 165 Watts Transistors 3 54 billion 5 2 billion Die Size 294 mm 398 mm Manufacturing Process 28 nm 28 nm The GeForce GTX 980 has double the SMs compared to the GK104 GPU used in the GeForce GTX 680 released two years ago Because of the changes implemented in GTX 980 s new Maxwell SM we were able to integrate 2x more SMs without doubling the die size With each SM...

Page 8: ...igned to provide dramatically improved performance per watt than prior GeForce GPUs Compared to GPUs based on our Kepler architecture Maxwell s new SMM design has been reconfigured to improve efficiency Each SMM contains four warp schedulers and each warp scheduler is capable of dispatching two instructions per warp every clock Compared to Kepler s scheduling logic we ve integrated a number of imp...

Page 9: ...r SM but 1 4x performance per core each Maxwell SMM can deliver total per SM performance similar to Kepler s SMX and the area savings from this more efficient architecture enabled us to then double up the total SM count compared to GK104 PolyMorph Engine 3 0 Tessellation was one of DirectX 11 s key features and will play a bigger role in the future as the next generation of games are designed to u...

Page 10: ...ression is realized a second time when clients such as the Texture Unit later read the data As illustrated in the preceding figure our compression engine has multiple layers of compression algorithms Any block going out to memory will first be examined to see if 4x2 pixel regions within the block are constant in which case the data will be compressed 8 1 i e from 256B to 32B of data for 32b color ...

Page 11: ...mes Maxwell uses roughly 25 fewer bytes per frame compared to Kepler This means that from the perspective of the GPU core a Kepler style memory system running at 9 3Gbps would provide effective bandwidth similar to the bandwidth that Maxwell s enhanced memory system provides New Display and Video Engines As the rapid adoption rate of 4K displays shows consumer demand for high resolution devices ha...

Page 12: ...he distracting screen tearing that currently plagues gaming when Vsync is disabled G SYNC also eliminates display subsystem generated stutter and reduces input lag that gamers put up with today Utilizing DisplayPort the GeForce GTX 980 can drive up to three G SYNC displays in Surround GM2xx Maxwell also ships with an enhanced NVENC encoder that adds support for H 265 also known has HEVC encoding H...

Page 13: ...real world all objects are lit by a combination of direct light photons that travel directly from a light source to illuminate an object and indirect light photons that travel from the light source hit one object and bounce off of it and then hit a second object thus indirectly illuminating that object Global illumination GI is a term for lighting systems that model this effect Without indirect li...

Page 14: ...expensive lighting technique particularly in highly detailed scenes GI has been primarily used to render complex CG scenes in movies using offline GPU rendering farms While some forms of GI have been used in many of today s most popular games their implementations have relied on pre computed lighting These prebaked techniques are used for performance reasons however they require additional artwork...

Page 15: ...topic and a video from GTC 2012 is available here Epic s Elemental Unreal Engine 4 tech demo from 2012 used a similar technique Figure 6 Epic s UE4 Elemental tech demo used voxel cone tracing for its jaw dropping GI Since that time NVIDIA has been working on the next generation of this technology VXGI that combines new software algorithms and special hardware acceleration in the Maxwell architectu...

Page 16: ...rection and intensity The first step as illustrated in the following figure is the coverage calculation step In this step each triangle needs to be checked from the perspective of each face of the cube to assess what fraction of the voxel is covered The picture on the left shows a traditional rasterized image of a simple scene The picture on the right is a visualization of the voxelized result In ...

Page 17: ...evaluate direct lighting at each non empty voxel and render the scene multiple times from the point of view of different light sources capturing the amount of light that hits each voxel In the figure below the direct light source indicated by the yellow dot causes light to strike the white walls and some of the surfaces of the red and green boxes Each will then emit reflected light based on the co...

Page 18: ...the main difference is that the final rasterization and lighting now has a new and more powerful data structure the voxel data structure that it can use in its lighting calculations along with other structures such as shadow maps The approach of calculating indirect lighting during the final rendering pass of VXGI is called cone tracing Cone tracing is an approximation of the effect of secondary r...

Page 19: ...ditionally need to launch hundreds or thousands of scattered secondary rays for each ray that bounces from the original reflector It s incredibly challenging to reflect these lights realistically especially when you also factor in the material properties of the various light reflectors Using our approach we ve replaced the thousands of secondary rays with just a handful of voxel cones that are tra...

Page 20: ...te diffuse or specular lighting with only a few scattered cones Ultimately as a result we re able to compute approximate GI at high frame rates in real time allowing us to realistically render glossy and metallic surfaces Figure 10 In the example above voxel cones are used to produce various forms of diffuse and specular light ...

Page 21: ...rendering the same scene from multiple views multi projection It turns out that multi projection is a property of other important rendering algorithms as well For example cube maps used commonly for assisting with modelling of reflections require rendering to six faces And as will be discussed in more depth later shadow maps can also be rendered at multiple resolutions Therefore acceleration of mu...

Page 22: ... original 3D triangle data properly Conservative raster helps the hardware to perform this calculation efficiently without conservative raster there are workarounds that can be used to achieve the same result but they are much more expensive The benefit of these features can be measured by running the voxelization stage of VXGI both ways i e with the new features enabled vs disabled Figure 12 belo...

Page 23: ...d redundant storage of voxel data saving significant amounts of memory You can read more about Tiled Resources at this link One interesting application of Tiled Resources is multi resolution shadow maps In the following Figure 13 the image on the left shows the result of determining shadow information from a fixed resolution shadow map In the foreground the shadow map resolution is not adequate an...

Page 24: ...pecial interlock hardware in the ROP is responsible for enforcing this ordering requirement DX11 introduced the capability for the pixel shader to bind Unordered Access Views of color and Z buffers and read and write arbitrary locations within those buffers However as the name implies there is no processing order guarantee when multiple pixel shaders are accessing the same UAV The next generation ...

Page 25: ...ming DirectX 12 API has been designed to have CPU efficiency significantly greater than earlier DirectX versions One of the keys to accomplishing this is providing more explicit control over hardware giving game developers more control of GPU and CPU functions While the NVIDIA driver very efficiently manages resource allocation and synchronization under DX11 under DX12 it is the game developer s r...

Page 26: ...nservative Raster discussed earlier in the GI section of this paper is one such DX graphics feature Another is Raster Ordered Views ROVs which gives developers control over the ordering pixel shader operations GM2xx supports both Conservative Raster and ROVs The new graphics features included in DX12 will be accessible from either DX11 or DX12 so developers will be free to use these new features w...

Page 27: ...terization providing opportunities for more flexible and novel AA techniques to be implemented in the context of both deferred and conventional forward rendering With programmable sample positions the ROMs that were used to store the standard sample positions are replaced with RAMs The RAMs may be programmed with the standard patterns but the driver or application may also load the RAMs with custo...

Page 28: ...tterns or interleaved across multiple frames in time Multi Frame Sampled AA MFAA is a new AA technique that alternates AA sample patterns both temporally and spatially to produce the best image quality while still offering a performance advantage compared to traditional MSAA The final result can deliver image quality approaching that of 8xAA at roughly the cost of 4xAA or 4xAA quality at roughly t...

Page 29: ...ement in image quality artifacts are sometimes observed on textures and when certain post processing effects are applied To address the usability and quality issues NVIDIA has developed a method called Dynamic Super Resolution In principal Dynamic Super Resolution works like traditional downsampling but it has a simple on off user control and it uses a 13 tap Gaussian filter during the conversion ...

Page 30: ...ng process to be at a given resolution set by the game itself Figure 15 A screenshot from Dark Souls 2 Standard 1080p on the left DSR on the right Dynamic Super Resolution can be found in the control panel of our Release 343 driver as well as GeForce Experience where we provide Optimal Playable Settings OPS for Dynamic Super Resolution for today s hottest games While it s compatible with all GeFor...

Page 31: ...e on the PC The GeForce GTX 980 supports new features for sampling control that will enable new AA techniques like MFAA allowing lower level AA sample patterns to be perceived as higher quality AA but with the faster performance of lower AA levels And the GeForce GTX 980 supports Dynamic Super Resolution technology an NVIDIA developed version of downsampling that brings 4K visuals to existing 1080...

Page 32: ...r for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all information previously supplied NVIDIA Corporation products are not aut...

Reviews:

No comments

Related manuals for GeForce GTX 980

Brand: Laird Pages: 36

Brand: STONEFLY Pages: 48

Optane M Series

Brand: Intel Pages: 59

Brand: DAD Pages: 32

CLICK PLUS C2-02CPU

Brand: Automationdirect.com Pages: 10

Brand: Cypress Semiconductor Pages: 11

PISO-725 Series

Brand: ICP DAS USA Pages: 8

Brand: Uniflair Pages: 24

Brand: National Instruments Pages: 56

Brand: Cypress Semiconductor Pages: 31

Brand: Radica Games Pages: 40

Brand: Airvana Pages: 90

Brand: Renesas Pages: 8

CloudGen Firewall

Brand: Barracuda Pages: 8

Brand: Renesas Pages: 67

HiGain Line Unit HLU-388

Brand: PairGain Pages: 7

Brand: Radio Shack Pages: 154

Brand: Xtreme Pages: 83

Brands by name

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Popular brands

Load more brands