Search…
⌃K

Datasets

Datasets allow users to upload and work with large files. This article will describe how to create, interact with, or manage datasets.
Datasets allow users to upload and store large files (over 100MB, up to a max of 45GB) without directly loading them into the kernel. This allows users to selectively load large files into the kernel as needed to manage memory usage.

Creating a dataset

To create a dataset:
  1. 1.
    Go to a notebook.
  2. 2.
    Open the project panel using the
    icon in the left sidebar.
  3. 3.
    Click on the
    icon.
  4. 4.
    Select
    Dataset.
  5. 5.
    Provide a Name.
  6. 6.
    Click Create.

Uploading files to a dataset

To upload files to a dataset:
  1. 1.
    Hover over the dataset to access the
    menu.
  2. 2.
    Click
    Add files...
  3. 3.
    Drag & Drop the file to the upload window OR click the Upload from computer link to search for and select the file.
A dataset and its contents are accessible to everybody in the space.

Reading files from datasets

To access files that are in a dataset from your notebook code, you need to read them into the kernel environment.

To read all dataset files into the environment

UI
Python
  1. 1.
    Hover over the dataset to access the
    menu.
  2. 2.
    Select
    Read dataset into environment (a cell will be created with code).
  3. 3.
    Run the code cell.
  1. 1.
    Create a new Python cell.
  2. 2.
    Run the following code replacing <name> with your dataset's name:
# will pull in all files within a given dataset
%ntbl pull datasets <name>

To read specific dataset files into the environment

UI
Python
  1. 1.
    Expand the dataset contents by using the chevon icon (
    ).
  2. 2.
    Hover over the file to access the
    menu.
  3. 3.
    Select
    Read file into environment (a cell will be created with code).
  4. 4.
    Run the code cell.
  1. 1.
    Create a new Python cell.
  2. 2.
    Run the following code replacing <name> with your dataset's name and <file_path> with the file name or path to the file:
# will pull a specific file from a dataset
%ntbl pull datasets <name>/<file_path>

Accessing the file

Once the files have been loaded into the kernel, they're accessible from the ../datasets directory.
The easiest way to get the full path to the file is to copy it from the dataset file menu:
  1. 1.
    Hover over the dataset to access the
    menu.
  2. 2.
    Select
    Copy Path.
  3. 3.
    Paste the path as needed in code.
# Reads a file called file_name.csv into a pandas dataframe
df = pd.read_csv('../datasets/dataset_name/file_name.csv)

Writing changes to dataset

Once loaded, a dataset is local to your notebook's kernel. If you'd like to save that file back to the dataset to persist the changes, you'll need to push the changes back to the dataset.

To write all changes back to the dataset

UI
Python
  • Hover over the dataset to access the
    menu.
  • Select
    Write changes to dataset (a cell will be created with code).
  • Run the code cell.
  1. 1.
    Create a new Python cell.
  2. 2.
    Run the following code replacing <name> with your dataset's name and <file_path> with the file name or path to the file:
# will push all changes back to a dataset
%ntbl push datasets <name>
If your changes are in a pandas dataframe, you can write these changes out to a file by running:
df.to_csv("../datasets/<file_path>")

Deleting datasets or dataset files

To delete a dataset or dataset file:
  1. 1.
    Go to a notebook in the space.
  2. 2.
    Open the project panel using the
    icon in the left sidebar.
  3. 3.
    Hover over the dataset or dataset file to access the
    menu.
  4. 4.
    Select
    Delete and confirm your choice.

Learn more

Datasets demo