line_profilerfixed and CI help required
Did you now that Kindred are looking for a Lead Data Scientist along with quants and python developers? Details are below along with other Senior roles in the job adverts.
At PyDataGlobal 2021 I’m pleased to announce 2 additional sessions that I’ll add to the schedule soon. Prior to the conference by a couple of weeks a small collection of us will run a trial set of “Briefing” sessions covering the state of the art in Higher Performance Python (me), NLP (Marco), Ethics in AI (Tariq), Bayesian modeling (Thomas) with a few names to add. You’ll see this on the schedule soon, only ticket holders will be able to join so we humbly suggest you buy your ticket sooner to get the invite.
In addition I’m going to run another iteration of my Executives at PyData session during the main conference, it’ll be a 2 hour session aimed at addressing issues like project derisking, communication, Product Design with DS, hiring, retention and contributing back to open source. This is aimed more towards managerial topics than building the best ML. Ticket holders (many of you I’d guess) will be able to invite a colleague such as a boss to attend this specific session for free. It’ll run as a set of discussion topics, led by me. Past sessions at previous conferences have involved lively chat with good ideas shared around solving managerial challenges. Reply to this if you want to ask me a question, I’d love to have you and/or managerial colleagues attending.
Mani Sarkar has written asking if you’d give some feedback on an exploratory stats notebook (on random numbers and correlation coefficient limitations), please comment on Kaggle if you’re curious:
“Hey Ian, I have been pondering on this idea about the limitations of our tool and hence we need to reinvent them or use existing ones with caution. I went about gathering ideas, proposals and solutions into a notebook (actually two) and here’s the link to them both via this Kaggle notebook. Here the tool(s) of limitation is the maths/stats toolbox we have been using when doing Data Science and Machine Learning kinda work. I would love to hear back from fellow members of your amazing community on what they think about what I have put together, how we could improve the ideas and existing tools and maybe create new ones if needed.”
One of my favourite higher performance tools
line_profiler had a bug preventing it working properly in Jupyter Notebooks. This was fixed last week and released to PyPI as v3.3.1 (and will follow on conda-forge soon). Update if you need to profile in a Notebook.
line_profiler has over 200,000 downloads a month from PyPI, conda forge will be similar suggesting 5M installs a year. It is a very popular project in the data science world. One of the core developers is looking for feedback on improving the CI process including the GPG signing flow. Maybe you can help? If so, contact the author. Saying that you’ve helped such a popular project will help your CV if you’re job hunting.
Core scikit-learn developer Olivier Grisel has tweeted that the example Boston house price regression dataset is to be removed from scikit-learn due to its casual and poorly designed “black” measure on neighbourhoods. I’ll confess that when I’ve presented using this dataset before in 2018 I’d felt awkward discussing the “b” column’s meaning. It never occurred to me that suggesting the dataset might have issues was an option. I’m foolish. It is nice to see this addressed.
Last issue the top link was the scikit-learn v1 announce on hackernews. Did you get to try it? If so - any joy? I’m always looking for interesting projects to share here - do you have a library or link to share here?
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
As the founding team data scientist, you’ll develop Good With’s intelligent data analysis and recommendation engines, supporting voice and natural language interaction with users.
Python and open source technologies are the overarching strategic choice for the data processing, analysis, machine learning and recommendation engines.
You’ll work at the heart of a dynamic multidisciplinary agile team to develop a platform and infrastructure connecting a voice-enabled intelligent mobile app, financial OpenBanking data sources, state of the art intelligent analytics and real-time recommendation engine to deliver personalised financial guidance to young and vulnerable adults.
As a founding member, you’ll get shares in an innovative business, supported by Innovate UK and Oxford Innovation, with ambitions and roadmap to scale internationally.
Supported by Advisors: Cambridge / FinHealthTech, Paypal/Venmo & Robinhood Brand Exec, Fintech4Good CTO & cxpartners CEO.
Working with: EPIC e-health programme for financial wellbeing & ICO Sandbox for ‘user always owns data’ approaches.
The Computational Optimisation Group has a two-year research opening (either pre- or post-doctoral) in surrogate-based optimisation. The role intersects computational optimisation, machine learning, and open-source software.
Senior Python Engineer - Knowledge Graph project for a major European Bank - Semantic Partners are seeking several skilled engineers with the following skillset - Python, Django, Flask etc, RESTful API’s, CI/CD, Containerisation, Docker, Kubernetes, NoSQL, BDD.
You’ll be joining a project team focusing on building a Knowledge Graph so an interest in Graph technologies and any experience of specific triple store systems would be a big plus, but more important is a desire to get into semantic engineering.
We believe that time is precious, so we create products, tech and services that make travel and holidays easy, simple and fun. Our purpose is clear: we offer customers less hassle so they can have more holiday. We are looking for Senior Data Engineers with big ideas. Problem solvers and collaborators who love a challenge and are always striving to improve and grow. You’ll bring everyone along on the journey with you - sharing your knowledge, inspiring others so they can improve too. The Data Team is a small but growing team meaning there’s lots of opportunity for you to get stuck in, help us progress and for you to learn and grow yourself. At an exciting time in our data journey, we’re working hard on clearing down the last of our legacy tech. We’re moving to a modern data stack; Airflow, Google BigQuery and Looker. The team’s work fuels our HEHA! App, enabling us to explode 7 data points into 1000, turning our customers’ trips into holiday experiences. Please visit the links below for further details on the opportunity.
Overton is looking for data scientists to join our small, dynamic team. Overton is a young company with big ambitions to help universities, think tanks and NGOs track how their research translates into real world policy, laws and regulations. Our platform allows users to search over 4.5m policy documents and understand how they link to each other, to academic papers and to individual authors. We use a wide range of techniques to clean and enhance our data, including entity recognition and linking, classifiers and document topic extraction as well as heuristic based approaches.
You’ll be helping with everything from developing new product updates to finding new data sources, experimenting with new ways to enrich the data and maintaining our existing pipelines. You will be fluent in Python and have experience with web scraping, machine learning pipelines and data analysis & reporting. Experience with data visualisation and front end development, familiarity with scholarly metadata, bibliometrics and/or knowledge of the academic, think tank or research impact space a bonus. See link for details on how to apply.
Sikoia is an ambitious new fintech building a unified data platform and API marketplace for global financial services. Our mission is to make it simpler for fintechs, lenders, and corporates to embed financial innovation and automate their decisioning, from customer onboarding through to risk underwriting.
Our founders are from Softbank, JPMorgan and Experian. With VC funding from EarlyBird and Seedcamp, plus support from top fintech CEO angel investors, we are now building a small, top quality tech/data engineering/data science team. Based remotely or at our office in central London, this is an opportunity to shape our product and technology from the very beginning.
Our tech stack is C# and Python, running in Azure. We are leveraging co-development projects with our first clients to build out our core platform. We have partnerships with UK credit bureaus and Open Banking providers, and are adding financial data vendors from many other countries. If you’re a mid/senior level Python developer or data engineer/data scientist, have fintech or SaaS experience, and are excited about fintech and financial innovation, then join us on this journey.
We are looking for an enthusiastic and talented researcher to help us build the next generation of traffic simulation systems.
You will be researching cutting-edge techniques from the fields of data science, computer science, and software development, and applying them to this domain. Your work will help to make simulation a cost-effective tool which can be used ubiquitously across the mobility ecosystem to solve a broad range of problems in transport planning, scheme design and appraisal, and operational control.
We are hiring a Data Engineer to join our team to help us set up the components of our client’s data platforms (such as data feeds, data warehouses, ETL infrastructure, etc). You’d also design and develop client-specific SQL data models that produce clean, structured and meaningful data sets for the business and other data functions (using dbt). And on top of that, help us build ETL scripts in python to extract data from APIs or perform pipeline transformations.
You’ll work with some of the most ambitious companies in the D2C startup ecosystem, including Pollen, On Deck, Ecosia, PensionBee, and Curio Labs. We offer a competitive salary (this is a junior role, range 45-55k) and a ton of benefits (enhanced parental leave, generous pension scheme, gym membership, refreshment allowance, home office allowance).
Beatchain is a music distribution and social media marketing and analytics platform that works with up-and-coming artists as well as established record labels. We are looking for a data engineer/scientist working in Python to help users manage, understand and put their data into context. Data sources include social media and music platforms scraped over hundreds of thousands of accounts using Scrapy, APIs including over two million Spotify playlists, and large quantities of streaming data from our distribution and record label partners.
This is a junior to mid-level role, you would be working within a small back-end team alongside the lead data-scientist. While the day-to-day ingestion and transformation of data is maintained, we research ways of presenting data to users through visualizations and predictive analytics. Recently, we used graph embeddings to model relationships between artists and genres to recommend related artists for social media campaigns. We use the familiar PyData Python/Pandas/NumPy stack deployed via AWS Lambda, Step Functions and Batch. Data lives in AWS RDS, DynamoDB and Redshift, migrating to Google BigQuery.
Here at Gousto, we are on a mission to become the UK’s favourite way to eat dinner!
We’re hiring for multiple Data Science positions: - Principal Data Scientist - (Menu) https://apply.workable.com/gousto/j/3C7165186A/ - Principal Data Scientist (Supply) https://apply.workable.com/gousto/j/4709837FEC/ - Data Scientist (Growth) https://apply.workable.com/gousto/j/C9F991E124/
If you want to work on some seriously interesting projects and get discounted Gousto boxes as part of the benefits package, please apply using the links above, mentioning that this newsletter sent you there!
See here https://www.gousto.co.uk/jobs for benefits and check out our blog: https://medium.com/gousto-engineering-techbrunch
At Vivacity, we make cities smarter. We gather real-time data from our sensors to reduce congestion, spot dangerous manoeuvres on the road to improve safety, and support autonomous vehicles.
You will join our existing Product team and actively shape the product vision and technical roadmap to ensure we are constantly innovating and meeting our users’ data needs.
Kindred’s ambition is to be the most insight-driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. We are now at the next stage of our journey, embarking on an initiative to enhance our sports and racing modelling and quantitative analysis capabilities.
The Quantitative Team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. This work builds upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are looking for a software engineer with a strong interest in sporting applications and experience in building solutions to handle varied external data sources. On joining, you will be responsible for creating exceptional quality data products, primarily based on sports event and market odds data, for use within the Quantitative Team and the wider business. Your work will be integral in the team’s delivery of market-leading probability and machine learning models to support our commercial and operational functions and decision making processes.
Kindred’s ambition is to be the most insight driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. The quantitative team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. The work will build upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are now looking for a talented Quantitative Analyst to join our team to help shape our sport and racing modelling efforts. This role provides an exciting opportunity to be a pivotal part of the team. On joining, you will be responsible for performing data analysis and building probability and machine learning models to derive descriptive and predictive insight about sporting events. Your work will help to deliver market-leading tools and capabilities to support our commercial and operational functions and decision making processes.
Kindred Group use data to build solutions that deliver our customers the best possible gaming experience and we have ambitious plans to get smarter in how we use our data. As part of these plans we’re looking to recruit a Lead Data Scientist to drive our advanced analytics initiatives and build innovative solutions using the latest techniques and technologies.
• To lead, manage and deliver our advanced analytics initiatives using cutting edge techniques and technologies to deliver our customers the best online gaming experience. • Working in cross functional teams to deliver innovative data driven solutions. • Able to advise on best practises and keep the company abreast of the latest developments in technologies and techniques • Building machine learning frameworks to drive personalisation and recommendations. • Building predictive models to support marketing and KYC initiatives. • Continually improving solutions through fast test and learn cycles • Analysing a wide range of data sources to identify new business value • Be a champion for advanced analytics across the business, educating the business about its capability and helping to identify use cases
Recommenders is Elsevier’s suite of recommendation systems, which uses Data Science and machine learning techniques to keep researchers appraised of developments in their field, new funding opportunities, finding peer reviewers and papers related to their work. We’re looking for a data engineer to help us build the pipelines which extract features from the unparalleled collection of research data flowing through our systems.
You’ll be working in a modern technology stack (AWS, Scala, Spark, Kafka, we’re currently looking at SageMaker and Kedro) as part of a small cross-functional team. If you’re interested in learning more, please contact Stuart White at the email address below.