We ran our first post-pandemic in-person PyDataLondon conference a week back in central London, bringing circa 400 data scientists and engineers together for a packed and really solid schedule. We had 3 days of events, a social with a very generous bar, spontaneous break-out sessions and lots of lively conversation. It was lovely to see folk again for the first time in 3 years. Personally I’ve been cautious having become a father during lockdown, so this was my first “big” event since lockdown…it was a bit overwhelming and I attended in chunks, avoided the crowded sections and tried to keep my mask on. I’m really glad I attended, thanks to all those who came, presented and volunteered.
I ran a session for “Executives at PyData” aimed at anyone with a leadership role. We discussed a whole pile of issues, I’ve written up some notes below. If you’re a leader, you’ll want to take a look. I’m also scheduling two follow-up Zoom calls (details below) to continue these discussions, one on Team Structure and one on Backlogs & Derisking. Reply if you’d like to attend to get the calendar invite for the Zoom call.
Further below you’ll find 8 listings for jobs at the likes of DeepMind, Deliveroo up to Staff Data Science roles.
Thanks again to my designer friend Myles for the great volunteered t-shirt design for the conference this year. If you need graphic design work, talk to Myles.
I’m pleased to say I’ve fixed the dates for my next on-line (Zoom) courses, I’ve been running these remotely with great success during our pandemic and I’m going to continue the virtual format. Early bird tickets are available and limited for each course:
I’m happy to answer questions about the above, just reply here. If you want a notification for future dates please fill in this form.
For those of you who have been waiting a while for me to get these listed - apologies, being the father to an infant has eaten a lot of time this year and I’ve had to take things sensibly before scheduling new courses.
I ran this session on Saturday morning at the conference. This is the 6th time I’ve run this (both in London and on the Global virtual conference), we vote on topics of interest to those in the room and work through them under “Chatham House Rules” (i.e. any private details stay in the room). We had a lively discussion about Team Structure, Building a good Backlog and Derisking.
I’m going to continue these conversation in future Zoom calls - details below if you’d like an invite.
This session, like my talk, and the talks by all the other speakers are given during volunteered time on behalf of the PyData community. If you’ve benefited from these talks, do please remember to thank your volunteer speakers.
We had 50 leaders in the room which represents circa 12% of the attending conference members. This grows every year as more and more people take on leadership roles in growing Python teams.
We talked about “what makes for a good team structure and what challenges your teams?”. A great post-pandemic discussion also broke out on “how do we move from my hybrid to fully-remote?”.
Several people noted success with the Spotify model, notably that the Guild model works well if it is built by passionate people and not designed by management. It was noted that the best tech leads are not the “ones needing a promotion” but those who are great at sharing knowledge. Some people ask “will this person embarrass me?” as a way to figure out if a promotion in a team is a good idea (it feels basic but it probably isn’t silly).
Several orgs have moved during Covid from an on-site or hybrid to fully-remote model with success - the key is to promote writing everything down and communicating in written form. Those who have to fight an “onsite/meeting-focused” mind-set find this difficult. 40+ of the 50 ppl in the room are in a hybrid environment, only a couple are on-site now. Less than 10 were 100% remote.
Tools mentioned for remote success include GitLab, Confluence, tuple - Mac only for realtime sharing, moving from GDocs to Notion, slack, email and making sure daily verbal touchpoints are in place (to talk to all the leads). “Roulette coffee chat” was mentioned by one to get to know someone you don’t know (nice idea). What tools have worked for you?
John Sandall also added this article on tools.
One counterpoint for the hybrid environment was that anyone going on-site is more likely to meet senior leaders so they get a career advantage. There’s no easy way to avoid this and teams were preferring to move to 100% remote if possible. Have you solved this issue?
After the main discussion we had 13 staying for a further chat on Team Structure during lunch, in part focused on those moving from a hybrid to fully remote setup (we’ll explore this in one of the future calls - see below).
A good backlog is essential and can’t be built “when you’ve run out of stuff to do”. To my mind it is an essential activity for the whole team to partake in, spreading into the wider organisation and learning who has pains and desire to solve them.
One key is building relationships - that takes time and effort. You can then find pains and explain if their problem is worth a DS solution. Embedding people in product teams means you can learn the roadmap for those teams. Good collaboration partners are probably willing to commit resources early (so it is less about what they want, and more about what they’re willing to commit). Go look for problems in that organisation and be willing to piggy-back some DS projects on the back of these (as these projects are probably important to the org). Always wrap everything up in a discussion on business value.
Some people time-box strongly to avoid project creep - but that only works if you’ve had prior success to win trust. Getting granular commitment to projects, building up from easier stages, showing value incrementally, is a very sensible way to advance a bigger idea. If you commit to too-large-a-thing and it fails, you erode trust which is hard to win back.
Think more about “this isn’t a DS project, it is a ‘reduce overbilling project that might happen to use some DS’“. Deliver something of value which works end-to-end (even if it is just uses a random number generator) in the first iteration to show progress. Derisk the apparent cost of a project by getting an supported intern to take the first steps. Fail fast with quick PoCs to show that risk is reducing.
One of the ways I know I bring value to a project is to help the team type less and talk more to figure out if we’re actually solving a valuable problem. Often you can lose sight of the big picture when challenged by a detail and knowing when to step back and maybe pivot can be crucial to maintain momentum.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
I’m going to run two virtual events focused on getting to more success. Please reply if you’d like a calendar invite to either of these:
My Success course deals with some of the above topics, I’m hoping that we can explore existing pains and solutions in these public calls and maybe some of you would like to join the subsequent course for a deeper dive into valuable tools and processes.
See recent issues of this newsletter for a dive back in time. Subscribe via the NotANumber site.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
The Royal Botanic Gardens, Kew (RBG Kew) is a leading plant science institute, UNESCO World Heritage Site, and major visitor attraction. Our mission is to understand and protect plants and fungi for the well-being of people and the future of all life on Earth.
Kew’s new Plants for Health initiative aims to build an enhanced resource for data about plants used in food supplements, allergens, cosmetics, and medicines to support novel research and the correct use and regulation of these plants.
We are looking for a Data Scientist with experience in developing data mining tools to support this. The successful candidate’s responsibilities will include developing semi-automonous tools to mine published literature for key medicinal plant data that can be used by other members of the team and collaborators at partner institutes.
IndexLab is a new research and intelligence company specialising in measuring the use of AI and other emerging technologies. We’re setting out to build the world’s first index to publicly rank the largest companies in the world on their AI maturity, using advanced data gathering techniques across a wide range of unstructured data sources. We’re looking for an experienced Data Engineer to join our team to help set up our data infrastructure, put data gathering models into production and build ETL processes. As we’re a small team, this role comes with the benefit of being able to work on the full spectrum of data engineering tasks, right through to the web back-end if that’s what interests you! This is an exciting opportunity to join an early stage startup and help shape our tech stack.
We are looking for Staff, Senior Staff & Principal ML Engineers to design and build algorithmic and machine learning systems that power Deliveroo. Our MLEs work in cross-functional teams alongside engineers, data scientists and product managers, who develop systems that make automated decisions at a massive scale.
We have many problems available to solve across the company, including optimising our delivery network, optimising consumer and rider fees, building recommender systems and search and ranking algos, detecting fraud and abuse, time-series forecasting, building a ML platform, and more.
The Regulatory Genome Project (RGP), part of the Cambridge Centre for Alternative Finance, was set up in 2020 to promote innovation by unlocking information hidden on regulator’s websites and in PDFs. We’re a commercial spin-out from The University of Cambridge’s Judge Business School and our proposition is to make the world’s regulatory information machine-readable and thereby enable an active ecosystem of partners, law firms, standard-setting bodies and application providers to address the world’s regulatory challenges.
We’re looking for a data scientist to join our remote-friendly technical team of software engineers, machine learning experts, and data scientists who’ll work closely with skilled regulatory analysts to engineer features and guide the work of a dedicated annotation team. You’ll help develop, train, and evaluate information extraction and classification models against the regulatory taxonomies devised by the RGP as we scale our operations from 100 to over 600 publishers of regulation worldwide.
The Met is looking for an analyst and a lead analyst to join its Strategic Insight Unit (SIU). This is a small, multi-disciplinary team that combines advanced data analytics and social research skills with expertise in, and experience of, operational policing and the strategic landscape.
We’re looking for people able to work with large datasets in R or Python, and care about using empirical methods to answer the most critical public safety questions in London! We’re a small, agile team who work throughout the police service, so if you’re keen to do some really important work in an innovative, evidence based but disruptive way, we’d love to chat.
An exciting opportunity has arisen for a Principal Population Health Analyst to join the Population Health and Care team at Lewisham and Greenwich Trust (LGT) where the post holder will be instrumental in leading the analytics function and team for Lewisham’s Population Health and Care system.
Lewisham is the only borough in South East London to have a population health management information system (Cerner HealtheIntent) that is capable of driving change, innovation and clinical effectiveness across the borough. The post-holder will therefore work closely with public health consultants, local stakeholders and third-party consultancies to explore epidemiology through the use of HealtheIntent, and design new models of transformative care that will deliver proactive and more sustainable health care services. LGT is therefore seeking an experienced Principal Population Health Analyst who is equally as passionate about transforming and improving the lives and care of patients through data analytics and can draw key and actionable insights from our data. The successful candidate will be an experienced people manager with strong communication skills to lead a team of analysts and manage the provision of data analytics to a diverse range of stakeholders across Lewisham, with particular focus on population health and bring together best practice and innovative approaches.
We are looking for Data Research Engineers to join DeepMind’s newly formed Data Team. Data is playing an increasingly crucial role in the advancement of AI research, with improvements in data quality largely responsible for some of the most significant research breakthroughs in recent years. As a Data Research Engineer you will embed in research projects, focusing on improving the range and quality of data used in research across DeepMind, as well as exploring ways in which models can make better use of data.
This role encompasses aspects of both research and engineering, and may include any of the following: building scalable dataset generation pipelines; conducting deep exploratory analyses to inform new data collection and processing methods; designing and implementing performant data-loading code; running large-scale experiments with human annotators; researching ways to more effectively evaluate models; and developing new, scalable methods to extract, clean, and filter data. This role would suit a strong engineer with a curious, research-oriented mindset: when faced with ambiguity your instinct is to dig into the data, and not take performance metrics at face value.
Join us in our mission to help tackle climate change, one of the biggest systemic threats facing the planet today. We are a start-up providing analytics and software to assist companies in navigating climate uncertainty and transitioning to net zero. We apply research frameworks pioneered by the Centre for Risk Studies at the University of Cambridge Judge Business School and are already engaged by some of the Europe’s biggest brands. The SaaS product that you will be working on uses cloud and Python technologies to store, analyse and visualize an organization’s climate risk and to define and monitor net-zero strategies. Your focus will be on full stack web development, delivering the work of our research teams through a scalable analytics platform and compelling data visualization. The main tech-stack is Python, Flask, Dash, Postgres and AWS. Experience of working with scientific data sets and test frameworks would be a plus. We are recruiting developers at both junior and senior levels.