In this interview with host Aaron Fifield of Chat With Traders, Bowne-Anderson discusses the importance of learning a programming language, and what advantages are enjoyed by traders who can code; from collecting stats on market behaviour, to performing research in a robust data-driven way, and backtesting and analysing trading ideas.
So how long would it take you to become fluent in a programming language? Find out in the excerpts below, or listen to the interview in full here...
I’ve met a significant number of people who work in finance and big companies, who have lost significant amounts of money due to spreadsheet errorsHugo Bowne-Anderson
On his route into programming...
My passage to programming is one of many routes that people take these days. My background is actually in science and humanities. I did my undergrad, I did a combined bachelor of arts and science and had a double major in pure and applied maths. I finished the science part and then went to grad school, the University of Sydney, then I went to do a PhD at the University of New South Wales, in pure maths. I tried to finish my arts degree at the same time, but they kicked me out because you can’t be enrolled in two different degrees at two different institutions, and I wasn’t aware of that. So I got kicked out of my second undergrad.
When I finished my pure maths PhD, I realised I wanted to get into the more applied math. So I actually started working in biology, I started a post doc in Germany, in Dresden, doing applied mathematics in a biology lab, thinking about cell growth, cell dynamics and bio-physics behind these complex systems. This is the first time I started working with a lot of data. Biologists kept asking me the same questions, about statistical tests, about cleaning their data, about learning R and python etc. So I essentially taught myself data analysis and statistics and a bunch of machine learning to essentially stay ahead of the curve working in these fields.
I then accepted a professorship at Yale University, I spent two years there doing research and I was doing a lot of teaching and education. Because biologists kept on asking me the same questions, I started to run what I called ‘practical statistics and data science’ workshops. At that point, I met the DataCamp team who were around 8-9 people (now 75) then and they were really trying to get moving on their Python curriculum. So they put the hard sell on me and I jumped onboard to come and build online education systems to teach in-browser data science.
On self-start coding...
My job was to do applied mathematical modelling and whatever that may involve. I realised at that point that it actually involved in learning a lot about the data I was trying to model, learning about the data generations process, thinking about it statistically and the way we do that these days is using programming languages.
Essentially people used to flip coins and use pen and paper, but if you want to use large amounts of data, complex data from different sources these days, you need to do it using programming. I did self-start in that way and I think that that’s a lot of people’s journey to coding as well. A lot of data scientists and coders don’t have computer science or software engineering degrees, they did learn on the job.
On when and why someone should learn to code...
Firstly I want to say that one of the alternatives is spreadsheets. And I think spreadsheets are amazing in a lot of ways; tens of millions of people are using spreadsheets to do their job. I do think there are challenges involved in using spreadsheets, I’ve met a significant number of people who work in finance and big companies, who have lost significant amounts of money due to spreadsheet errors, for example. If you’re working in spreadsheets I definitely think it can serve you well, but essentially in today’s job market and even a lot of traders, if you want to have an edge or a differentiating factor for your practice and the type of work you do, I think programming is definitely a very viable option.
We see now companies like CitiGroup, Bank of America, JP Morgan; all these places put Python first now. I think it was Robin Wigglesworth in the Financial Times a few years back, wrote that ‘it used to be traders who were first class citizens of the financial world, but its technologists now who are the priority’ and people who can work robustly with large amounts of data from numerous sources. The way to do this now is through programming. Whether that’s Python or R or databases etc.
But the real conversation needs to revolve around whether you can do this stuff within a GUI (Graphical User Interface), which Excel is an example of. Or you want to do it in a programming language. I’d like to give you a few things about programming that I think really help with this type of process. The first is that it’s reproducible. When I’m using a spreadsheet or some sort of GUI, I do a bunch of clicks and send you my results, and you can’t reproduce what I did, all you have are the results. Whereas if its text-based I can share it with you straight away, so it’s reproducible in that sense, you can reproduce it on your own operating system. It’s ‘repeatable’ and on top of that it’s what we call ‘differable’; if you write some code to automate a process and then you change it, because it’s plain text, we can see exactly what the change is, what changes have been made and we can see the workflow and the process there.
On top of that, one of the strongest aspects of coding and programming is that it automates stuff, I’ve hinted at this throughout what I’ve said already, but if you do something e.g. pivot tables a few days a week and you can write three lines of code that does it for you, that’s just common sense to me. Pointing and clicking isn’t saleable in that way. I think these provide general arguments for why programming is good.
On when to move away from spreadsheets and learn a programming language...
I think it's beneficial when they’re doing the same stuff all the time. If someone finds themselves in Excel on Monday, Tuesday, Wednesday with different data sets, doing a similar thing, it makes sense for them to learn and write some code in order to automate that I think. When it wouldn’t be beneficial for them would be a one-off job doing a bunch of data entry for example. But in terms of automating time-consuming tasks, it definitely makes sense then.
It also makes sense when they want to do more robust modelling and understand why their models are saying what they’re saying, Excel is relatively powerful for modelling but if you want to build models you can dig into and find out why they’re doing what they’re doing, I think both Python and R are both exceptional at this as well. Another aspect is if you want to share your workflow with other people. I would also say one of the problems with spreadsheets is your data source, logic, functions, formatting are all intertwined. I mean you see nightmare spreadsheets where people highlight a row in order to mean something and I think separation from data from this type of logic and from formatting is incredibly important to do robust data analysis and data science.
On where people can start out...
I’m definitely biased, I would say DataCamp is definitely a great place and I’ll tell a few afterwards, but one of the reasons I initially joined DataCamp and one of the reasons that I’m still here is DataCamp has helped to lower the barrier to entry, in particular the initial steps. When you come to DataCamp you don’t need to install anything locally on your machine, we spin it up for you, it’s all in browser, learning and writing Python code straight away and you feel functional straight away. I think that’s one of the more important things for our learners and learners in general, to feel like you’re doing something, so not spending the first 90 minutes installing something and then having an error returned. I definitely think that’s very motivating and helpful in all honesty.
Depending on whether you’ve chosen R or Python, figure out where you want to write your code and, what I mean by that is, things called IDE’s, Integrated Development Environments, all it is, is like a piece of software that you’ll click on and you’ll open, its where you write code, save it, execute it, that type of stuff. In Python (and for R and other languages) there is an incredible project called Project Jupyter, and their Jupyter Notebooks are really good for starting to run code because it’s kind of like an interactive data science notebook essentially where you can write text, store images, videos write code, execute it in your browser and it’s all going straight away. On top of that, on the internet, there are a lot of galleries of interesting Jupyter Notebooks where you can get up and running with.
The two other things I would also suggest is to read as wide as possible, blogs, tutorials, FastForward Labs has a lot of great stuff on data science, ODSC which is the Open Data Science Conference, it’s got a lot of finance stuff as well. And wherever you are, go to meet ups and meet people and hackathons. In all honesty, doing this after work in your own bedroom it can get hardcore, we’ve all been there, but seriously coding with other people and comparing programming, the learning curve there makes it a lot more fun.
The last bit of advice I would give is try to get your day job to give you time and or money to do this. For example if you want to subscribe to DataCamp, we’ve got a whole bunch of free stuff you can check out first, but tell your boss that this is your learning stuff that will help your job. Or try to get Friday afternoons for your work to invest in your future to make you more efficient at your job. I host a podcast called DataFramed where we put out weekly episodes for DataCamp, and I urge you to listen to that as well because the premise of DataFramed is to interview thought leaders and working data scientists from all over the place, whether it be finance, tech, the most recent one was health.
On the best thing to do when you get stuck...
The amount of frustration you can experience with really idiosyncratic error messages is wild. I do think open-source packaged developers are kind of getting onto this and realising if they want more users they need to provide more feedback messages. If you’re getting whacky error messages that you don’t understand, my number one suggestion is Google it. Google knows (or your search engine of choice).
A lot of the time, you’ll end up on a website called Stack Overflow, which is essentially a forum for answering these types of questions. So if you copy and paste your error message into Google…search engines are your best friend.
On how long it will take to learn the basics...
It depends on what you want to do. But if you want to get a sense on how to analyse data, how to get a result and let’s say you work full-time, say five hours a week, I’d say six months you could get up and running pretty well. Maybe less, but I would say six months is pretty good if you’re efficient and resourceful.
Always remember it's hard work as well, as we said it can be frustrating, it can infuriating at points in all honesty. Particularly when you work full-time, you’ve got a family and all this stuff; you’re taking a lot of time out of your full-time life in order to learn something new. But just remember that this has a pretty serious pay off as well. I do think that computation is becoming so much more important and the ability to code and have these conversations will make you a much more valuable member of the workforce. I think there are really big wins to be made if you put in the time here.