Latest Blogs

Deferred+: Next-Gen Culling and Rendering for Dawn Engine

Posted by JN Bucci & H. Doghramachi

Introduction
In order to be able to satisfy growing demands on visual fidelity and runtime performance, we investigated a new culling and rendering system for future use in the Dawn Engine. This was part of many researches done by our internal R&D team, LABS, for the Deus Ex Universe, but is not used in the upcoming game Deus Ex: Mankind Divided. A major aspect of this investigation was to develop a system that is compatible with the existing asset pipeline and allow for fast iteration times during game production. Our culling system combines the low latency and low overhead of a hierarchical depth buffer based approach [Hill and Collin 11] with the pixel accuracy of conventional GPU hardware occlusion queries. It efficiently culls highly dynamic, complex environments while maintaining compatibility with standard mesh assets. Our rendering system uses a practical approach to the idea of deferred texturing [Reed 14] and efficiently supports highly diverse and complex materials while using conventional texture assets. The culling as well as the rendering system makes use of new graphics capabilities available with DirectX 12, most notably enhanced indirect rendering and the new shader resource binding model.

Culling
Our culling system is partially based on the ideas presented by [Haar and Aaltonen 15] where the depth buffer from the previous frame is used to acquire an initial visibility and potential false negatives are retested with the updated depth buffer from the current frame. In this way we avoid rendering dedicated occlusion geometry which may be difficult to generate e.g. for natural environments. However, instead of using a hierarchical depth buffer based approach and subdividing meshes into small clusters, a concept is used in the spirit of [Kubisch and Tavenrath 14] that relies on the early depth-stencil testing capabilities of modern consumer graphics hardware. For this, the oriented bounding boxes of the occludees are rendered using the reprojected depth buffer from the previous frame and the associated pixel-shader is forced to use early depth-stencil testing. In this way only visible fragments mark in a common GPU buffer at a location, unique for each mesh instance, that the corresponding instance is visible. A subsequent compute shader generates, from the acquired visibility information, the data which is used for indirect rendering. As proposed by [Haar and Aaltonen 15], occluded objects are retested with the updated depth buffer from the current frame to avoid missing false negatives. Figure 1 gives an overview of the involved steps and resources.

 

Figure 1. Overview of the culling system. Arrows on the left side of each culling step represents input data,
arrows on the right side output data.The colors match with those of the corresponding GPU buffers.

Since scenes, that were built with the current asset pipeline of the Dawn Engine, consisted anyway of relatively small modular blocks, with this approach we could avoid introducing a system for subdividing meshes into small clusters. By replacing hierarchical depth buffer based culling with the early depth-stencil based approach, we were able to achieve within a natural jungle environment without mesh clustering on average 2.3x higher culling rates and 1.6x faster frame times.

Rendering
For modern games, it is important to utilize a rendering system that can handle increasingly complex mesh geometry and realistic surface materials. Forward rendering systems support high material diversity but either suffer from overdraw, or require a depth pre-pass, which can be expensive for meshes with a high triangle count, GPU hardware tessellation, alpha-testing or vertex-shader skinning. Deferred rendering systems manage to run efficiently without a depth pre-pass, but only support a limited range of materials and therefore often require additional forward rendering for more diverse materials. Our practical approach to deferred texturing combines the strengths of both rendering systems by supporting a high diversity of materials while only performing a single geometry pass. We go one step further than traditional deferred rendering and completely decouple geometry from materials and lighting. In an initial geometry pass, all mesh instances, that pass the GPU culling stage, are rendered indirectly and their vertex attributes are written, compressed, into a set of geometry buffers. No material specific operations and texture fetches are done (except for alpha-testing and certain kinds of GPU hardware tessellation techniques). A subsequent full screen pass transfers a material ID from the geometry buffers into a 16-bits depth buffer. Finally, in the shading pass for each material, a screen space rectangle is rendered that encloses the boundaries of all visible meshes. The depth of the rectangle vertices is set to a value that corresponds to the currently processed material ID and early depth-stencil testing is used to reject pixels from other materials. All standard materials that use the same shader and resource binding layout are rendered in a single pass via dynamically indexed textures. At this point, material specific rendering and lighting (e.g. tiled [Billeter et al. 13] or clustered [Olsson et al. 12]) are done simultaneously. Figure 2 gives an overview of the rendering process.

 

Figure 2. Overview of the rendering system.

Since, in general, memory bandwidth of current consumer graphics hardware is much more limited than computational power, it is important to keep the size of the geometry buffers as low as possible, thus vertex attributes needs to be stored in a compressed way. We store texture coordinates into 2x 16-bits by storing only their fractional part after interpolation. Since the derivatives of the original texture coordinates are stored alongside, no seams will be visible later on. In theory, derivatives can be reconstructed in the shading pass by using the neighbor texture coordinates, but in the case of geometry edges, where appropriate neighbor texture coordinates can’t be always obtained, artifacts will be visible. This is especially noticeable under camera motion with dense alpha-tested foliage. Therefore we decided to store texture coordinates along with their derivatives. For that we treat derivatives in X- and Y-direction as 2D vectors. By decoupling the vector length from the orientation, the vector length can be stored as 2x 16-bits and the orientation as 2x 8-bits, which gives still enough precision for anisotropic texture filtering. We looked into storing the tangent space (tangent, bitangent, normal) as a quaternion into 32 bits according to [Mc Auley 15] but dismissed this approach due to visible faceting on smooth shiny surfaces. Instead we store the tangent space as an axis-angle representation, where the normal is stored as an axis by using octahedron normal vector encoding [Meyer et al. 10] and the tangent is stored as an angle. In this way we could store the entire tangent space in 32 bits and could achieve the same quality as storing the tangent space uncompressed into 3x 30-bits. It should be noted that this method requires about half the instruction count to encode the tangent space into 32 bits as when converting a TBN matrix into a quaternion in a mathematically stable, precise manner and packing it into 32 bits.

Pros and Cons
Below we summarized the most important pros and cons for the presented culling and rendering system.

Culling

Pros:

• Same pixel accuracy as with conventional GPU hardware occlusion queries, but without latency issues (popping).
• Support of highly dynamic, complex, alpha-tested occluders without needing to author and render dedicated occluder geometry.
• High culling efficiency even without mesh clustering for modular composited scenes, thus fully compatible with standard asset pipelines.
• Low performance overhead of culling system.
• Number of draw calls massively reduced (performance benefit even with low-overhead graphics APIs such as DirectX 12 and Vulkan).

Cons:

• Draw commands are no longer in deterministic order (nearly coplanar surfaces are more likely to cause Z-fighting and should be avoided).
• Depth sorting of draw calls no longer given, causing higher overdraw (with deferred+ less problematic due to light weight geometry pass and high culling efficiency).

Rendering

Pros:

• Due to lightweight geometry pass, depth pre-pass no longer required.
• GPU warp utilization for applying materials and lighting significantly better than with clustered forward shading [Olsson et al. 12], thus small triangles are far less problematic and GPU hardware tessellation performs much better.
• Unified rendering system that, in contrast to deferred rendering, can handle highly diverse range of materials efficiently.
• Geometry processing completely decoupled from material rendering and lighting, resulting in less shader permutations and faster iteration times in game production.
• By decoupling geometry processing from material rendering, switching of GPU resources significantly reduced.
• In contrast to system with deferred vertex attribute fetching [Burns and Hunt 13], geometry information only fetched once per frame in cache-friendly, coherent manner.
• Compressed texture data doesn’t need to be decompressed into GPU memory like with deferred rendering, thus texture memory bandwidth significantly reduced.
• Modified geometry buffers contain useful information not available with deferred rendering:

     o Texture coordinate derivatives (fix mip-mapping issues with deferred decals)
     o Vertex normals (enhance screen-space ambient occlusion techniques)
     o Vertex tangents (anisotropic lighting)

• Not depending on vendor-specific graphics features and compatible with the entire range of DirectX 12 capable graphics hardware (when supported range of dynamically indexed textures too low, applications can still fall back to rendering common materials separately).

Cons:

• Vertex attributes much more limited in comparison to traditional rendering techniques.
• Transparent objects have to be handled separately.
• Antialiasing still difficult to handle.

Results
To capture the results, we used scenes from the game Deus Ex: Mankind Divided that we converted into a format which we could load and render in an experimental framework that is based on DirectX 12. Our test machine used an AMD Radeon R9 390 graphics card and the screen resolution was set to 1920x1080. In the video below you can see a scene rendered with 1024 dynamic spherical area lights using clustered lighting. To simulate dynamic objects for occlusion culling, the source of each area light is rendered as an emissive sphere. To be able to compare our rendering system with a reference clustered forward renderer while using GPU culling, all materials, except the emissive sphere material, use the same shading code and are rendered for deferred+ in separate passes with the help of the described depth-stencil rejection method. We also used for all material textures 8x anisotropic texture filtering to ensure that our texture derivative compression method doesn’t produce artifacts. The frame time and GPU times displayed on the left side of the screen are measured in milliseconds and the culling counters in the middle of the screen only take meshes into consideration that were frustum-culled on CPU.

At the beginning of the video we toggle GPU culling on/ off which reveals a performance gain of approximately 1.44 ms and a culling efficiency of approximately 80%. Then we display the boundaries of culled objects as red wireframe boxes. Finally we compare the rendering times of deferred+ with clustered forward rendering which shows that deferred+ is running approximately 4.31 ms faster while producing quality-wise almost equivalent results. With adaptive GPU hardware tessellation enabled, using a maximum tessellation factor of 5, deferred+ runs even 24.98 ms faster. Under realistic game conditions with more complex materials and more complex lighting (different light types, shadow mapping), the performance benefit of deferred+ should be even more prominent.

Conclusion
We presented a system for culling and rendering complex, highly dynamic scenes, which makes use of new graphics capabilities available with DirectX 12. The culling system provides high culling efficiency even for non-clustered, traditional mesh assets while having a low overhead. The rendering system outperforms a quality-wise comparable clustered forward rendering system, even more so when GPU hardware tessellation techniques are employed. It is fully compatible with conventional texture assets, doesn’t depend on vendor-specific graphics features and can run on the entire range of DirectX 12 capable graphics hardware. By combining the proposed culling and rendering system it is possible to render an entire complex scene in just a few draw calls.

Acknowledgment
We would like to thank Francis Maheux for providing us with the assets for the prototype, Samuel Delmont and Uriel Doyon for their valuable input on the implementation itself. Furthermore we would like to thank Eidos Montreal and Square Enix for allowing us to make the results of this research project public.

Along with Wolfgang Engel, Eidos-Montréal LABS is working on an in-depth article describing our work, for the new upcoming book, GPU Zen, coming out in 2017.

          

Eidos-Montréal is always looking for great talent to help us shape the next generation of games. If you believe that you have what it takes, we definitively want to hear from you. Look at the current positions available at Eidos-Montréal and join our talented teams now!

References
[Billeter et al. 13] M. Billeter, O. Olsson and U. Assarsson. "Tiled Forward Shading". GPU Pro 4: Advanced Rendering Techniques. A. K. Peters, pp. 99–114. 2013.
[Burns and Hunt 13] C. A. Burns and W. A. Hunt. “The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading”. Journal of Computer Graphics Techniques, Vol. 2, No. 2. 2013.
[Haar and Aaltonen 15] U. Haar and S. Aaltonen. “GPU-Driven Rendering Pipelines”. In ACM SIGGRAPH 2015 Talks, ACM, Los Angeles, USA, SIGGRAPH ’15.
[Hill and Collin 11] S. Hill and D. Collin. “Practical, Dynamic Visibility for Games”. GPU Pro 2, A K Peters, 2011, pp. 329-347.
[McAuley 15] S. McAuley. “Rendering the World of Far Cry 4”. In Game Developer Conference 2015 Talks, San Francisco. 2015.
[Meyer et al. 10] Q. Meyer, J. Süßmuth, G. Sußner, M. Stamminger and G. Greiner. “On Floating-Point Normal Vectors”. In Eurographics Symposium on Rendering, Vol. 29, No. 4. 2010.
[Olsson et al. 12] O. Olsson, M. Billeter and U. Assarson. "Clustered deferred and forward shading". In HPG 12: Proceedings of the Fourth ACM SIGGRAPH/ Eurographics Conference on High Performance Graphics, ACM, pp. 87–96. 2012.
[Kubisch and Tavenrath 14] C. Kubisch and M. Tavenrath. “OpenGL 4.4 Scene Rendering Techniques”. In GPU Technology Conference 2014 Presentation, NVIDIA, San Jose, USA, page 50.
[Reed 14] N. Reed. “Deferred Texturing”. Blog post, 2014, http://www.reedbeta.com/blog/2014/03/25/deferred-texturing

 

Comments

Please login to make a comment

Bloggers

Greetings movie lover, once again welcome on our article Putlocker Review. In which today we are here with information about a new website which is very crucial for movies. In this article we will try to do our best in giving information about the website, its functionality, what are its product? Is it safe or not? In other languages, you will get a complete Putlocker Review.

thutrangctp

Summer is easy with heavy rain but I like it. It's wonderful to have rainbows after rain, and I watch it until it disappears. It seems to be free but it helps me feel comfortable
papas games

saidoparkorzz

Video explanation about Next-Gen Culling and Rendering for Dawn Engine really superb. Nice videos. elite dui lawyers

kaylee99

It is rare to see such as the post. I am glad to visit your site.
zombs royale

teamnutrition2018

Best Games I Like Is
We Have All Kind Off Food Supplements In Pakistan Online Shopping
Best Quality Lowest Price In Pakistan
Pre-Workout Supplement

teamnutrition2018

They don't seem like the sort of individual you
Pre-Workout Supplement

teamnutrition2018

thing so badly they ought to have paid for it.
Whey Protein

Emilly12


Thanks for sharing this helpful article.
Lyrics Song:bodak yellow lyrics

felixmorel

I have seen the video which you have shared here. It clearly gives the idea of how the inside view looks like and what are the major things happens inside the factory. Thank you so much for share the details with us. Google Chrome running out of memory

felixmorel

I appreciate for sharing the diagram here as it makes my understanding more easier. The customer graphics information allows people to get some information about the overall procedures. Thanks for the share.chrome keeps freezing

meghana99

Thank you for sharing a valuable information about the Next-Gen Culling and Rendering for Dawn Engine web designers in eluru
for more new ideas, we are thankful for this article.

SectorQube

Thanks to the author for the article! Very pleased! visit us

SectorQube

Thanks to the author for the article! Very pleased!Visit us

bepannaah

Wow Sir So interesting! I don't believe I truly read your post I was looking this from many time today I got it. So great to find someone with some original thoughts on this topic. Really.. many thanks for starting this up. This website is something That I was needed. Again Thanks
Kaleerein Zee Tv

bepannaah

Sir It is a good and helpful information and it is kind of nice to know exactly you published this information here, along with I found a valuable post on your blog. Great work thanks for writing
Yeh Rishta Kya Kehlata Hai Star Plus

bepannaah

This is the most impressive article on your website, thanks for constantly working on replenishing the material like this. good work I appreciate your work. Today I got my favourite blog Thanks for writing
Yeh Hai Mohabbatein Star Plus

assignmenthelp

Assignment Help USA offers best in class Assignment Writing Help Services by knowledgeable and experienced professors. We are having a team of subject experts.

Company Name: Assignment Help Experts
Email: info@assignmenthelpexperts.com
Phone: +61 390 881 335 or +1-877 839 9989
Country: Australia
Website: https://www.assignmenthelpexperts.com/

hedotuyule

IHOP offers a different variety of breakfasts and much more items for customers. To improve their services, even more, talktoihop conducts IHOP Customer Satisfaction Survey to get the feedback from the customers about the services and the food which the IHOP offers.

elisawisegm

I feel it interesting, your post gave me a new perspective! I have read many other articles about the same topic, but your article convinced me! I hope you continue to have high quality articles like this to share with veryone spanish dictionary

Kaitykatedn13

Absolutely facinating stuff! Where do you come up with the ideas to write about that and what kind of research goes in to it? Very, very impressive hotmail sign up

Jasonroyo1

Subjects assortment from minor departure from the ultra-present day systems for cutting edge antialiasing, updates for tiled and bunched rendering procedures, molecule and fx gadget format for aaa sports motors, rendering shippable volumetric cloudscapes on an advanced period of product equipment, approximations. Candle Boxes Wholesale | Buy Custom Boxes Wholesale in USA

Helenduffy

Your article reflects the issue people are concerned about. I look forward to reading quality articles that contain timely information from you.Thank you
Essay writing service

Taylorshaw

Didn’t have the slightest idea about the hardships of creating environment for games. Each frame has to be connected to the last one and the transitions should be acceptable to the player which is a tough job and require great expertise. Trinity Builders Reviews

acepsaepul

These words are taken from witf's Central PA Spelling Bee 2013 written test. Middle school students representing more than 60 schools from 10 counties in Central Pennsylvania took the 100-word written test from which these words come. Visit : Walatra Bersih Wanita

GregoryMurphyy

wiki.

hedotuyule

Employees can create their www.walmartone.com login account by using the WalmartOne Identification Number which is provided by the company.

Emilly12

Thanks to the author for the article! Very pleased! Thank you for your post, I look for such article along time, today i find it finally.
this post give me lots of advise it is very useful for me ! ludo king

riviya

They have designed this engine to meet the growing demands on visual fidelity and runtime performance. They have introduced the engine for rendering the modern games. It’s really informative. The engine is having best infrastructural facilities. Thanks for sharing the information. Hope you will update the post.the original source

felixmorel


The merits and de merits helpful for people to recognize the problems and all. Think you need to take care of many factors before buying such things. Thank you so much for the share.louvre guided tour

annetrose

I have watched the video which you have provided here and found that it is very interesting. From this, I came to see deferred plus tech demo. All should watch it. THank you. Keep uploading more interesting videos like this. printer driver is unavailable

KatherineKatie234

This is the good post with respect to this issue. I was to an incredible degree satisfied to discover this site.I anticipated that would state thank you for this uncommon read!Writing recommendation is an undertaking being given to the understudies of any field. https://horseboxinsuranceuk.blogspot.com/

johnsmith26

Subjects variety from variations on the ultra-modern techniques for advanced antialiasing, upgrades for tiled and clustered rendering strategies, particle and fx device layout for aaa sports engines, rendering shippable volumetric cloudscapes on a modern-day era of commodity hardware, approximations. directv

AnnJack

The new culling and rendering system for future use in the Dawn Engine is an excellent concept and really liked to appreciate. This type of new inventions will open a new door towards future. Good job. Keep going on. case study writing service

Skana

Thanks to the author for the article! Very pleased!
color switch

AricTisan

Really i appreciate the effort you made to share the knowledge. This is really a great stuff for sharing. Keep it up . Thanks for sharing. Nested switch Statements Java Homework

AricTisan

Those who come to read your article will find lots of helpful and informative tips
Robust Control Assignment Help

AricTisan

Those who come to read your article will find lots of helpful and informative tips
Econometrics Project Help

dogra

Our journey started as frontrunner to bring Apple technology to the Indian subcontinent. Today, SRSG has emerged as one of the leading full-service player offering an array of products and services for system integration, broadcast consultancy, IT infrastructure services, maintenance services and digital archiving services for the broadcast industry.
Media office interiors
Apple service center in Kolkata
Apple iphone 7 dealers delhi
System Integration service providers
ipad dealers in Guwahati

AricTisan

This is really great work. Thank you for sharing such a good and useful information here in the blog for students. WORDAI Manual Multilevel Spinning

elizabethdavid

The optimistic return and the quiet of mankind! It is important that I do not want to think about what happened to write my paper, and you combine the unfixed will and the continuing index inequality. This is a recipe for the disaster of our civilization.

jammartinn1987

Assignment Writing Services Australia Subjects variety from variations on the ultra-modern techniques for advanced antialiasing, upgrades for tiled and clustered rendering strategies, particle and fx device layout for aaa sports engines, rendering shippable volumetric cloudscapes on a modern-day era of commodity hardware, approximations.

xx12

Eidos Montreal aren't stating this is a screen capture from the following Deus Ex game, right. Today they declared the Assignment Help shiny new (ish) motor that'll control their next cyber fest, and basically needed to show that with a picture.

suzain

optimistically being rewarded with the perennity of humanity! significantly, I don't want to think about what takes place whilst you combine the subordination of unfastened will to Custom Essay Service capital with continual and exponential inequalities. it is a recipe for catastrophe on the scale of our civilization.

lauramartin

This is the good post with respect to this issue. I was to an incredible degree satisfied to discover this site.I anticipated that would state thank you for this uncommon read!Writing recommendation is an undertaking being given to the understudies of any field. Much gratitude to you to such a degree. You are investigated another isolating and rendering structure for later use in the Dawn Engine.
UK Essay Writing Service

WilliamKent

This is the best post regarding this matter. I was to a great degree fulfilled to find this site.I expected to state thank you for this exceptional read!Writing suggestion is a task being given to the understudies of any field. Much thanks to you to such an extent. You are explored another separating and rendering framework for later use in the Dawn Engine.
Visit Best essay writing service

Archive

-->