This is the first public paper, published by NASA, that mention the word “Big Data“. Actually it’s not related to data processing but is like the beginning for this funny buzz word 🙂
In the area of scientific visualization, input data sets are often very large. In visualization of Computational Fluid Dynamics (CFD) in particular, input data sets today can surpass 100 Gbytes, and are expected to scale with the ability of supercomputers to generate them. Some visualization tools already partition large data sets into segments, and load appropriate segments as they are needed.
However, this does not remove the problem for two reasons: 1) there are data sets for which even the individual segments are too large for the largest graphics workstations, 2) many practitioners do not have access to workstations with the memory capacity required to load even a segment, especially since the state-of-the-art visualization tools tend to be developed by researchers with much more powerful machines. When the size of the data that must be accessed is larger than the size of memory, some form of virtual memory is simply required. This may be by segmentation, paging, or by paged segments.
In this paper we demonstrate that complete reliance on operating system virtual memory for out-of-core visualization leads to egregious performance. We then describe a paged segment system that we have implemented, and explore the principles of memory management that can be employed by the application for out-of-core visualization.
We show that application control over some of these can significantly improve performance. We show that sparse traversal can be exploited by loading only those data actually required. We show also that application control over data loading can be exploited by 1) loading data from alternative storage format (in particular 3-dimensional data stored in subcubes), 2) controlling the page size.
Both of these techniques effectively reduce the total memory required by visualization at run-time. We also describe experiments we have done on remote out-of-core visualization (when pages are read by demand from remote disk) whose results are promising.