How nbdev helps us structure our data science workflow in Jupyter Notebooks

Jupyter notebooks have been rightly praised for making it very easy and intuitive to experiment with code, visualize results and describe your process in nicely formatted markdown cells. In our work as data scientists and Machine Learning Engineers at 20tree.ai, we are using notebooks all the time and they are a great tool in our toolbox. However, there are some downsides that we experience using notebooks in our workflow. For one thing, working in notebooks often results in unstructured, badly documented and untested code that takes a lot of time to transport into a proper codebase. Also, version control in Jupyter Notebooks is a disaster and involves large changes in each commit that are hard to traceback.

Nbdev solves many of these issues by making it very easy to transform Jupyter notebooks into proper python libraries and documentation. It incentives us to write clear code, use proper Git version control and document and test our codebase continuously. At the same time preserving the benefits of having interactive Jupyter notebooks in which it is easy to experiment. Using nbdev has improved our workflow significantly and perhaps it can do the same for you!

What is nbdev

nbdev is a library created by fast.ai that forms the missing link between the data exploration of Jupyter notebooks and programming an actual codebase that produces high-quality software. In short, nbdev provides a framework for:

  • Exporting selected notebook cells into a Python library
  • Automatically generating documentation based on function signatures and notebook cells
  • Running notebook cells as unit tests.

Of course, it has many more features that you can find at nbdev.fast.ai or by reading the nbdev launch post, but these are the ones we make the most use of. In addition, nbdev provides utilities for stripping notebook metadata and handling merge conflicts, both greatly improving development cycles (and general quality of life 🙂 ) when checking notebooks into a git repository.

Image for post
Every notebook that has a #default_exp tag at the start, gets exported to a Python module. Automatically using best practices such as defining which functions get imported when you do import * by setting __all__ = [‘function1’, ‘functon2’, …].

The magic of nbdev is that it doesn’t actually change programming that much; you add a # export or # hide tag to your notebook cells once in a while, and you run nbdev_build_lib and nbdev_build_docs when you finish up your code. That’s it! There’s nothing new to learn, nothing to unlearn. It’s just notebooks. More than anything, nbdev encourages us to write cleaner notebooks with a clear separation between code (the cells that get exported) and experimentation, visualization and testing (the cells that don’t get exported). As a result, our notebooks are more readable, and easier to share among the whole team. As a wise (wo)man once said, the real nbdev was inside of us all along.

index.ipynb with the generated documentation from this notebook next to it.
index.ipynb with the generated documentation from this notebook next to it.

Why do we use it?

As data scientists, much of our work involves — as you might expect — data. This means loading data, transforming data, combining data, and at some point actually using that data. Especially in the transforming and combining stages, it’s critical to ensure that no mistakes slip in. If you are trying to train a neural network for semantic segmentation but your segmentation map is shifted by a few pixels, your data is essentially invalid. Worse yet, valuable time is often lost trying to get some code or machine learning models to work, while minor typos have sneaked in such small, stupid, but bothersome bugs. Because notebooks are run iteratively through cells it’s almost like you’re debugging while coding. Errors are much quicker caught this way. Here at 20tree.ai, we mostly work with georeferenced (satellite) data. When different parts of your data are projected in different coordinate reference systems, it (again) becomes very easy for mistakes to slip in.

When developing new code, a pretty standard pattern for us consist of the following:

  1. Small functions are written in a Jupyter notebook. The notebook is used to visually inspect the output and to informally test that the code behaves as expected;
  2. The functions get copy-pasted into a proper codebase;
  3. The original notebooks are scattered to the wind;
  4. Code gets changed over time, maybe a mistake slips in. When asking for more details on a bit of code, someone points to some file called Untitled_v3_better_labels2.ipynb with the comment “it’s probably very outdated though”.

With nbdev, we can nip this whole sequence in the bud. You write your code in a Jupyter notebook and that’s it. You’re done, because the notebook is the proper codebase! The main code cells are exported to the library and the output of some cells forms the visual explanation as well as unit tests — that you’d have to make generally separately otherwise — that are automatically run when pushing the notebook to Github. So while iteratively coding documentation and testing are (almost) entirely free. For example, after implementing data augmentation, you’re surely going to visualize the outputs to ensure that the data looks as you would expect and, just as importantly, that the labels are similarly transformed. With nbdev, this visualization is simply in your codebase, right below the function definition:

Image for post

These tests can also very easily be used in your Github continuous integration pipeline, making it very easy to do proper checks before merging some new code into your existing codebase.

The fact that your entire codebase is living in notebooks also means that when something is not working as expected, it is very easy and intuitive to debug. You can quickly change something and see how it affects the output.

COOKIE POLICY

This website of Overstory B.V. (the “Owner”), which is available at https://www.overstory.ai/ (the “Website”), uses Cookies and similar technologies in order to ensure the correct functioning of its services and to improve the navigation experience of the users. This document provides detailed information regarding the use of Cookies and similar technologies and how they are used in the Website.

What are Cookies?

Cookies are text files containing small amounts of information that are downloaded to your device when you visit a website. Cookies are then sent back to the originating web domain on your subsequent visits to that domain. Most web pages contain elements from multiple web domains so when you visit the website, your browser may receive cookies from several sources.

Cookies are useful because they allow a website to recognise a user’s device, allowing you to navigate between pages efficiently and to remember preferences and generally improve the user experience.

Session cookies are deleted automatically when you close your browser and persistent cookies remain on your device after the browser is closed (for example to remember your user preferences when you return to the Website).

For further information, please visit http://www.allaboutcookies.org/

What categories of Cookies are used by the Website?

The Cookies used on the Website are described hereinafter:

  • Strictly necessary Cookies
    This Website uses Cookies to save the User’s session and to carry out other activities that are strictly necessary for the operation of this Website, for example in relation to the distribution of traffic. These cookies are essential in order to enable you to move around the Website and use its features.
  • Performance Cookies
    This Website uses Cookies to save browsing preferences and to optimise the User’s browsing experience. Among these Cookies are, for example, those used for the setting of language or for the management of first party statistics employed directly by the Owner of the Website.
    The services contained in this section enable the Owner to monitor and analyse web traffic and can be used to keep track of User behaviour. This allows us to provide a high quality experience by customising our offering and quickly identifying and fixing any issues that arise.
    Some of the services listed below collect statistics in an anonymised and aggregated form and may not require the consent of the User or may be managed directly by the Owner – depending on how they are described – without the help of third parties.
    If any third party operated services are listed among the tools below, these may be used to track Users’ browsing habits – in addition to the information specified herein and without the Owner’s knowledge. Please refer to the privacy policy of the listed services for detailed information.
 Cookie Name Source Purpose Further Information

-utma

-utmb

-utmc

-utmz

_hp2_ses_props.APP_ID

_hp2_props.APP_ID:

 

 

 

 
Google Analytics

Google uses the data collected to track and examine the use of this Website (e.g. how visitors use the Website), to prepare reports on its activities and share them with other Google services.

Google may use the data collected to contextualise and personalise the ads of its own advertising network.

Stores timestamp and cookie domain/path

Stores properties set by addEventProperties API)

Google Analytics is a web analysis service provided by Google LLC (“Google”).

Personal Data collected: Cookies and Usage Data.

Place of processing: United States – Privacy PolicyOpt Out; Ireland – Privacy Policy. Privacy Shield participant.

Click here for Google’s privacy policy in respect of Google Analytics http://www.google.com/analytics/learn/privacy.html

You may opt out of tracking by Google Analytics by visiting https://tools.google.com/dlpage/gaoptout?hl=en-GB.

 
  • Managing contacts and sending messages Cookies
    This type of cookies makes it possible to manage a database of email contacts, phone contacts or any other contact information to communicate with the User. These services may also collect data concerning the date and time when the message was viewed by the User, as well as when the User interacted with it, such as by clicking on links included in the message.

How you can provide or withdraw consent to the installation of Cookies?

Some of the purposes for which Cookies are installed may also require the User’s consent. You will have seen a pop up to this effect on your first visit to this website. Although it will not usually appear on subsequent visits, you may withdraw your consent at any time by following the instructions set below.

Where the installation of Cookies is based on consent, such consent can be freely withdrawn at any time following the instructions provided in this document.

In addition to what is specified in this document, you can manage preferences for Cookies directly from within your own browser and prevent – for example – third parties from installing Cookies.

Through browser preferences, it is also possible to delete Cookies installed in the past, including the Cookies that may have saved the initial consent for the installation of Cookies by this website.

Users can, for example, find information about how to manage Cookies in the most commonly used browsers at the following addresses: Google Chrome, Mozilla Firefox, Apple Safari and Microsoft Internet Explorer.

With regard to Cookies installed by third parties, you can manage their preferences and withdrawal of your consent by clicking the related opt-out link (if provided), by using the means provided in the third party’s privacy policy, or by contacting the third party.

Limitations of liability

The Owner neither guarantees, nor is liable for damages or harm of any nature that may result from the following circumstances:

  1. Lack of operation of the Website or its incorrect performance;
  2. Lack of usefulness, suitability or validity of the services and content provided on the Website regarding the results and expectations of the User;
  3. Existence of viruses or programmes on the User’s computer.

The Owner shall not be liable, in any circumstance, including negligence, for loss of business, access, benefits, data, for indirect, secondary, special or consequential damages resulting from the access or use of services of the Website, or that are otherwise within its scope.
Since the installation of third party Cookies and other tracking systems through the services used within this Website cannot be technically controlled by the Owner, the Owner cannot be held responsible for the installation and use of such Cookies.
Therefore, any specific references to Cookies and tracking systems installed by third parties are to be considered indicative. In order to obtain complete information, you are kindly requested to consult both the cookies and privacy policies for the respective third party services listed in this document

Updates to the Cookies Policy

After your initial visit to the Website we may change the Cookies we use. This Cookies Policy will always allow you to know who is placing Cookies, for what purpose and give you the means to disable them so please regularly consult our Website in order to be always aware of the rules applicable to the use of Cookies.

Owner

The Website is owned by Overstory B.V., with registered office at Weesperstraat 61-105, 1018VN, Amsterdam, The Netherlands.

For any questions regarding this Cookies Policy, please contact us at info@overstory.ai

Given the objective complexity surrounding the identification of technologies based on Cookies, Users are encouraged to contact the Owner should they wish to receive any further information on the use of Cookies by this Website.