Whilst working on a recent iteration of my Higher Performance Python course I discovered that a part of why groupby operations is slow is that Pandas has to run a factorize before applying the groupby. If you’ve got a Categorical encoded dataset already then you can skip this step - so your groupbys go faster. That’ll feature again in my next class.
You get these after e.g. doing “astype(‘categorical’)” or by binning with a cut or qcut.
The next Higher Performance Python course runs on July 21-23 via Zoom. We’ll also use profiling to understand what’s slow in your code, use Numba and vectorisation for acceleration and scale to larger data with Dask. Do you want tools that’ll make you shine at work? Early bird tickets are still available.
I’m starting to plan a new Pandas course, digging into the guts to understand what’s slow and complex and why. What might you want to see in such a class? Do you get confused with timeseries (I used to)? Do pivots and crosstabs seem overly complex on many columns of data (yup, they can be)? Do you know how to code defensively in modern Pandas? What would help you with such a course?
Pynguin - an evolutionary algorithm based automated test writer for Python appeared on Hackernews recently. It is at an early stage and I’m a bit suspicious of automated test generation but anything that helps get more tests into scientific code is probably a good thing so I’m cautiously interested.
Here’s a nice animation of the difference between Covariance and Mutual Information (MI is generally “better”, but I’d be happy to hear push back if you have a strong opinion).
For distractions - I really liked Newitz’s Autonomous - a near-future sci-fi book combining biohacking, (bad) performance enhancing supplements and a round trip to slavery in humans via robots. I’m currently also digging Resident Alien for streaming sci-fi.
Have you got a library you’d like to share? I’ll happily take a paragraph’s description of an interesting data science/data engineering library if it feels useful to this audience.
See recent issues of this newsletter for a dive back in time.
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts cost £300+VAT.
Lean provides Payment and Data APIs to unlock the financial technology sector and enable financial innovation in the Middle East.
We launched our first products to market at the beginning of 2021 and now support over 90% of the retail banking market in the UAE. With ambitions to build an entire ecosystem for Fintech in the region we’re now looking to expand to new regions and support stakeholders from end-users, to Fintechs, regulators and financial institutions.
As we collect more raw data and enable an increasing variety of use cases, our data science products and processes will play an important role in Lean’s advancement within the Fintech ecosystem. We are looking for an ML Engineer with a software engineering background and a strong interest in innovative financial applications. Your role will be to extract exciting and scalable features from the river of data that flows through our system.
Here at M&S the data science function builds end-to-end AI and machine learning solutions in retail and e-commerce and helps our colleagues in Food, Clothing & Home, Fashion, Marketing, Loyalty, Supply Chain, Growth, Customer Services etc. driving value from data and create personalised experiences for our customers. We apply state of the art machine learning techniques to solve a variety of problems such as outfit recommendations in fashion, personalised offers for our loyalty program, pricing optimization, demand forecasting for supply chain, product waste management for retail, and AI powered campaigns for our marketing. We are hiring at both Senior and Lead levels. If you would be interested in finding out more, please contact me on the below email address.
The OVO Group is a collection of companies with a single vision: to power human progress with clean affordable energy for everyone. The data which we collect is multi-faceted and complex. There is an opportunity to become a truly AI business, with market changing innovation and finely optimised processes leading to zero carbon and low cost for our energy customers.
We are looking for a Senior Data Scientist with hands-on experience building end-to-end data science products in a production setting. Primarily you will work within cross functional, product teams, but might be required to contribute to specific data science initiatives. You will take a lead role in demonstrating the value data science can add to teams across OVO Energy, working with stakeholders to understand their data science needs, owning the delivery of projects from start to finish, and evaluating value post delivery. You will also be expected to coach junior Data Scientists and help to define data science best practices. Technology stack: SQL, Python, GCP (BigQuery, Composer, Cloud Functions, Dataflow, CloudRun), CircleCI for CI, Github for version control.
I’m looking for data-focussed software consultants on behalf of Sahaj.ai. Sahaj are a premium consultancy who focus on the intersection of data science, data engineering and platform engineering to solve complex problems for clients across a variety of industries. Their culture is built on trust and transparency, they have open salaries and there is a flat structure with no job titles or grades.
You can expect to work in small 2-5 people teams, working very closely with clients in iteratively developing and evolving solutions. You will play different roles and wear multiple hats, including analysis, solution design and coding.
You will have a passion for data and software engineering, craftsman-like coding prowess and great design and solutioning skills. You will be happy coding across the stack: front, back and DevOps, and have the desire and ability to learn new technologies and adapt to different situations. As a guide, you are likely to have 7 to 20 years+ experience.
FreeAgent removes the stress and pain of dealing with business finances, allowing business owners to focus on running their business. Our data science and data platform teams have created a machine learning model to categorise business banking transactions that’s currently applied to over 100,000 customers in production. We have big ambitions to further our use of machine learning and artificial intelligence and you could be a part of that! We primarily work with Python/pandas/scikit-learn and use AWS SageMaker to build and deploy our models and our regular company hack days and wiggle weeks provide a great opportunity for data scientists to pursue their own ideas.
We are the fastest growing online travel agent in the UK, and we help people find their dream holiday. In the data science team, we collaborate with people across the company on business problems using programming, statistics, and machine learning.
If you love solving abstract problems, have an outstanding university degree, know SQL and Python, and have experience with machine learning, we want to hear from you!
NHS Test and Trace and the Joint Biosecurity Centre are looking for data engineers to help the UK deal with COVID. We are looking for people at junior, mid and senior level to help the UK analyse COVID data to save lives and help the UK respond. Whilst we can’t compete on salary, we have a modern cloud tech stack (AWS, Azure, Github), fascinating health datasets and some really interesting technical work that is helping the UK move forward.
We have access to the UK’s testing data and a variety of interesting related datasets, some of which pose challenges on scale and complexity for our data scientists. It is likely the challenges will grow as we get more granular data as we merge with PHE. Technology stack (Azure SQL, Azure Devops, Azure Pipelines, AWS: Athena, Sagemaker. Most code is in Python, we use black for PEP8, Github actions for CI)
You will probably have seen some of our work in government communications over the last 12 months. These are permanent civil service roles with the associated benefits.
NHS Test and Trace and the Joint Biosecurity Centre are looking for data scientists and engineers to help the UK deal with COVID. We are looking for people at junior, mid and senior level to help the UK analyse COVID data to save lives and help the UK respond Whilst we can’t compete on salary, we have a modern cloud tech stack (AWS, Azure, Github), fascinating health datasets and some really interesting technical work that is helping the UK move forward. You will probably have seen some of our work in government communications over the last 12 months. Examples of projects include: 1) identifying new clusters of cases using network Theory (network and GraphX), determining the effect of lockdowns (Causalimpact) and agent modelling for epidemiological models. The teams work in Python or R and there are roles that range from data analysis to running complex epidemiological models with academic partners and everything in between. These are permanent civil service roles with the associated benefits.
Kindred’s ambition is to be the most insight-driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. We are now at the next stage of our journey, embarking on an initiative to enhance our sports and racing modelling and quantitative analysis capabilities.
The Quantitative Team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. This work builds upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are looking for a software engineer with a strong interest in sporting applications and experience in building solutions to handle varied external data sources. On joining, you will be responsible for creating exceptional quality data products, primarily based on sports event and market odds data, for use within the Quantitative Team and the wider business. Your work will be integral in the team’s delivery of market-leading probability and machine learning models to support our commercial and operational functions and decision making processes.
Kindred’s ambition is to be the most insight driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. The quantitative team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. The work will build upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are now looking for a talented Quantitative Analyst to join our team to help shape our sport and racing modelling efforts. This role provides an exciting opportunity to be a pivotal part of the team. On joining, you will be responsible for performing data analysis and building probability and machine learning models to derive descriptive and predictive insight about sporting events. Your work will help to deliver market-leading tools and capabilities to support our commercial and operational functions and decision making processes.
Kindred Group use data to build solutions that deliver our customers the best possible gaming experience and we have ambitious plans to get smarter in how we use our data. As part of these plans we’re looking to recruit a Lead Data Scientist to drive our advanced analytics initiatives and build innovative solutions using the latest techniques and technologies.
• To lead, manage and deliver our advanced analytics initiatives using cutting edge techniques and technologies to deliver our customers the best online gaming experience. • Working in cross functional teams to deliver innovative data driven solutions. • Able to advise on best practises and keep the company abreast of the latest developments in technologies and techniques • Building machine learning frameworks to drive personalisation and recommendations. • Building predictive models to support marketing and KYC initiatives. • Continually improving solutions through fast test and learn cycles • Analysing a wide range of data sources to identify new business value • Be a champion for advanced analytics across the business, educating the business about its capability and helping to identify use cases
At GWI we are solving the problem of how we can enable users of different levels of data expertise to interpret and draw useful insights from market research data sets. We run the largest globally harmonised market research data set across nearly fifty countries and counting, as well as an increasing range of specialised data sets. Our machine learning engineers support the development of intelligent features in the next generation of our audience insights platform, and provide solutions for custom modeling and analytics projects requested by our clients.
We have an ambitious roadmap and are looking for senior machine learning engineers to bolster our ranks. The role involves a healthy mix of research, model training, coding and deployment, as well as communicating findings to various stakeholders both internal and external. Our culture and values ensure the team is well organised and always performing at a high level. We value learning and keeping abreast of the latest research very highly so we’re applying a wide range of techniques across various branches of machine learning to the services and features we build.
We are working on a very exciting joint project with one of the largest tech companies in the world over the next 3-4 months and looking to top up our own data engineering skills with senior experts in this field. Our tech stack is a mix of AWS and GCP for historical and project requirement reasons. Predominantly working with Python, Spark, Kinesis, S3, Lambdas, Big Query.
We are looking for contractors who have skills in building scalable Big Data pipelines in the cloud, automating data quality testing, creating scalable architecture and then delivering its implementation. Our challenge is to strike the right balance between building scalable re-usable solution while iterating quickly and ensuring timely project delivery. For the duration of the project contractors would join and work remotely with existing teams.