Sample Large DataFrames

DEX’s grid has a default limit of visualizing 50k rows, although this limit can be changed in the dx settings ( If a user attempts to render a pandas DataFrame with more than 50K rows, a random sample of 50k rows is displayed in the grid. The user will know they are only viewing a sample because of the “beaker” affordances shown here:

How to collect a new sample with push-down filters applied

When a user applies new filters, they will have the ability to re-sample the original DataFrame using those new filters.
Upon applying new filters, the sampling button will become active and the tooltip will say “You have made updates to your filters, you can collect a new sample.”:

Collecting a new sample

  • Click on the button shown above, or alternatively, click on the “cog” icon to open the DEX right drawer, and then click on the “Sampling” option.
  • Pick the sample size you’d like returned; the default value is 50,000, but will ultimately be limited by the original size of the DataFrame.
  • All currently applied filters will be used when collecting the new sample.
  • Click on “Collect” to collect the new sample.

After your sample is collected

DEX will reload the newly-sampled data that applies your current filters. Information about the sample will then be available in the Sampling area of the right drawer, which will represent metrics relative to the original (full) dataset.

To resample

Filters applied in any previous sample will persist, so users can resample at any point by removing the existing filters, adding new filters, or both before going through the sampling flow described above.