Stefanie Molin recently wrote the technical book “Hands-On Data Analysis with Pandas” (published by Packt on July 26, 2019). Her work shows readers how to analyze data and get started with machine learning in Python using the powerful pandas library. She’s a software engineer and data scientist, and a member of the Security Data Science team at Bloomberg that researches and develops solutions using data and machine learning to help improve and automate Bloomberg’s information security processes. In her job, Stefanie focuses on identifying and answering security-related questions using data and developing software to solve them. She holds a bachelor’s degree in Operations Research from Columbia University’s Fu Foundation School of Engineering and Applied Science (CUSEAS), with minors in Economics, and Entrepreneurship and Innovation.
Our conversation with Stefanie about her book was edited for length and clarity.
What first interested you in data analysis, Python and pandas?
I started my career working in ad tech, where I had access to log-level data from the ads that were being served, and I learned R to provide insights to clients. I wanted to focus more on data analysis, so I switched jobs. After joining another company, I participated in an internal hackathon, during which my hackathon team developed an alerting system. We won the hackathon, which gave me confidence to continue on this path.
I shifted my focus towards coding and transitioned to another team, where I developed and delivered training on R and how to use R for data analysis. I saw people benefit from this work, plus creating the training helped me fill in the gaps in my own knowledge and skill set. We eventually switched to Python, and I learned that on my own, along with machine learning. Through this, it became clear that a working knowledge of pandas is essential for these kinds of data-rich analyses.
In your words, what is your book about? Who is your target audience?
My book is about pandas – you simply can’t do data science in Python without pandas – and it covers data analysis and machine learning. Since data skills have become essential in a variety of fields, the target audience is anyone who has prior data science experience and now wants to move to Python or someone who has experience programming in Python and wants to learn data science.
The book assumes that readers have some Python knowledge, which can be easily learned from a tutorial. It is designed to help someone conduct data analysis and machine learning in Python through examples that use interesting data sets. There are various examples of code and applications of that code rather than just the mathematics and theory.
What skills will readers gain from this book?
The book covers statistics, working with APIs, data wrangling, data simulations, and visualization — along with machine learning using scikit-learn built on top of these. The goal is for readers to apply the skills they’ve learned to conduct an analysis entirely in Python, meaning they can find, manipulate, and visualize their data; build a machine learning model; and analyze the results. I want people to be able to connect what’s in the book to their work, so I used some realistic examples.
How long did the book take to write?
I worked on the book for 11 months, mostly during weekends. I started the outline in August 2018 and worked about 20 hours a week on average. As the publication date approached, this grew to about 40 hours a week. During the last few months, I was fortunate to be able to work from home a few days each week so that I could use my time more efficiently.
What was your writing process?
I collaborated with my publisher on the book’s structure and then wrote an outline that included an idea for each chapter and a short summary. I later expanded the outline with more explicit detail and deadlines. Breaking everything into manageable pieces was something I had always done while coding, so it was a natural way for me to structure my writing process.
How did you develop your code examples?
I didn’t want to use data that worked perfectly or was created for teaching purposes, as real-world models aren’t perfect – I wanted to demystify the underlying concepts and performance expectations. The examples I used will show readers how to package all the functionality required for that analysis in a package, so they can build upon the examples. In addition to a finance example, I also included examples using data about wine, exoplanets, weather, and earthquakes, to name a few.
There are three application chapters, and chapter 7 includes an example from finance. Since I didn’t want to alienate any readers, I developed a scenario that required building a Python package for all the tasks, like cleaning data, calculating beta and alpha, visualization, and basic modeling, which is practical knowledge that can be applied to any domain. Finance isn’t my field of expertise, so I researched financial analysis and sought out people from Bloomberg’s Global Data team to review the financial concepts.
What support did you have from your managers and engineering peers?
I could not have written this book without the support of my manager, colleagues, friends, and family. My coworkers in Bloomberg’s Office of the CTO, Chief Information Security Office (CISO), and Global Data were invaluable resources – we have a great culture here at Bloomberg and everyone was so willing to help.
My manager and I have a very open line of communication, and we frequently discussed the status of my writing. Since the book’s theme is relevant to my work, my manager recognized that my dedication of extra time to this would provide benefits to our team in the long-term.
My manager and colleagues across departments reviewed drafts of the chapters, provided edits, and helped refine my ideas. Friends also reviewed and validated the content.
What challenges did you have writing the book? What skills did you gain from the process that you have already put to use in your role?
Since I had to be constantly on throughout this process, I learned to work efficiently and to quickly switch focus between my job and my book. I also learned to know when to ask for and accept help – I could not have finished this book by myself.
I had to learn to accept criticism and not take it personally, which can be difficult, especially when you’ve spent so much time working on something. While those first review sessions were hard, my writing improved over time, and there was a big difference in the later chapters. Also, perfection is an impossible goal – at some point you just have to let it go.
Time management and working remotely with people were definitely challenges. I had to learn to explain complex concepts and ideas over email and text rather than face-to-face. I also had to ensure that my writing schedule didn’t take up all my personal time and impact my personal relationships.
Do you have plans to write additional books?
Not at the moment. I like to stay busy though. My next goal is to pursue an online Master’s degree in computer science from Georgia Tech, while continuing my work at Bloomberg.