Software Engineering for Data Scientists

Machine Learning Vs Software Engineering Differences

Testing and Logging Code

  • Unlike typical unit tests, your new tests need to be able to take stochastic outputs. Even the logs will need to be designed and thought through differently.
  • Testing and logging are the two key aspects to maintaining a large scale enterprise quality project.

Legacy Systems

Working Effectively with Legacy Code

  • Unit Tests: Unit tests will help us know what the expected value of core functionality is.
  • Refactoring: This will allow us to start to refactor the code piece as per need. Since we have the unit tests, we should be able to modify the code pieces knowing we know what the output should look like.
  • The refactoring should involve breaking large classes into smaller ones.
  • Reuse similar pieces of code and make the whole thing more readable.
  • Document the stack flow and the logic.

Production Machine Learning 101

  • We know the features to use
  • The model to use
  • The metric, we are using to measure the model.

DataBases and Pipelines

  • Databases with data pipelines and scheduled updates.
  • Ensure indexation and keys (Primary and Foreign)
  • Specific modelling table with the necessary features. This will allow us to pull the data into RAM (Pandas Data Frame) and start modelling.

Servers & Frameworks

  • Data: Temp data sets can be saved as CSV files
  • Features: On the fly feature generation
  • Modelling: The modelling modules go here, this includes any kind of un-supervised steps such as clustering as well
  • Visualization: Extra module to produce, graphs and any other outputs from the reports generated

Machine Learning Operations (MLOps)

  • The outputs can be varying, how do you test for these?
  • How to catch for drift in the data and the need for new learning.
  • The frequency of training and model parameter updation.
  • Model performance monitoring and how?
  • How to quickly diagnose issues in the system performance.

--

--

--

Startups & Analytics

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Review of Medium Articles About Tech: Volume 1 (2022–02–25)

Me trying to explain Firebase 🔥

New in MicroStation CONNECT Edition Update 15 — Export/Import Variables and Variations to and from…

Enemy Tries to avoid the Player Laser

Numbers and statics

The value of (good) developers

Run Google Play Apps On Mac

How to Demonstrate Your Progress When You Get Stuck

Matricies, digital codes

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhinav Unnam

Abhinav Unnam

Startups & Analytics

More from Medium

Modular Data Retrieval (for Machine Learning, et al.), an Introduction

Supporting a Data Science Team

Beginners Notes on Databases — The Relational Model of Data (1)

How AutoML is Killing the Data Scientist