Large datasets increasingly provide critical insights into crustal and surface processes on Earth. These data come in the form of published and contributed observations, which often include associated metadata. Even in the best-case scenario of a carefully curated dataset, it may be non-trivial to extract meaningful analyses from such compilations, and choices made with respect to filtering, resampling, and averaging can affect the resulting trends and any interpretation(s) thereof. As a result, a thorough understanding of how to digest, process, and analyze large data compilations is required. Here, we present a generalizable workflow developed using the Sedimentary Geochemistry and Paleoenvironments Project database. We demonstrate the effects of filtering and weighted resampling on Al2O3 and U contents, two representative geochemical components of interest in sedimentary geochemistry (one major and one trace element, respectively). Through our analyses, we highlight several methodological challenges in a “bigger data” approach to Earth science. We suggest that, with slight modifications to our workflow, researchers can confidently use large collections of observations to gain new insights into processes that have shaped Earth’s crustal and surface environments.