Did you know that Netacea are hiring for a Head of DSci + Data Engineer, Signal need a Head of DSci, Aflorithmic need a Data Eng and 2iQresearch are hiring for a Senior Dev and and a Quant Dev? VivacityLabs have a Special Projects role and Inawisdom need a Software Engineer. Details for all of these and more are down below.
I spent a really interesting hour with Richard Pelgrim of Coiled last week talking on the state of Higher Performance Python and Dask. Richard helps folk on-board to Dask so he had some great observations about blockers and diagnostics that help new users.
I spoke on the usual suspects -
numexpr to speed-up Pandas and I briefly mentioned
Vaex. Richard gave a little live demo of some of the latest changes in Dask’s dashboard diagnostics. The video is available if you fill in the access form on Coiled’s blog in the link above.
When I teach my Higher Performance course I’m always looking for new tips. I hadn’t realised that the Dashboard has new diagnostics, you can now drill into the unmanaged memory in Dask subprocesses and the blog entry gives some tips in reducing RAM usage. There’s also a nice Jupyter Lab workflow which embeds the diagnostics into panes of the Lab, again this is something I want to try.
One of the attendees mentioned pprofile which is a thread-aware equivalent to
line_profiler. Do any of you have experience using it? I’ve never tried it, I’m wondering what it can reveal for quantitative work?
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
I’ve listed another iteration of my Success course for February 9th & 10th, it aims to give you the tools you need to derisk projects and increase the likelihood that you get to deliver ontime and with happy clients. Send me an email back if you have questions?
Gael Varoquaux, core dev for
sklearn, has released a new keynote talk (20 mins) on the state of
scikit-learn. He notes that the typical dataset size for data scientists is <TBs (via kdnuggets) and that the dataset sizes are pretty stable between 2013-2018 and there’s a million monthly users. He notes a kaggle survey on “what holds you back” (dirty data, lack of talent, lack of managerial buy-in being the top issues) and talks on how to get
sklearn to scale to larger sets of machines beyond its original goals.
joblib (which I teach in my course - brilliant parallelisation tool!) gets called out and I learn that it can even push work out to Dask.
He finishes with some results on his dirty text-data library dirty-cat and notes that missing-data support is now wired into the
HistGradientBoosting estimators in
sklearn. It is well worth a watch.
With one of my clients recently we’ve been reviewing methods to rank and score new risky projects. Counting the amount of time already spent in an organisation (e.g. headcount and days-spent) is a pretty solid way of looking for automation wins. On top of this we can estimate how much of the manual work could be automated away and what this is worth (if you automate a lot expensive line-items, the $ can quickly add up!). We’ve then been thinking on “what happens if we relax some of these rules, beyond what a human can reasonably check - how much more do we find?”.
Do you use techniques like this to derisk new projects? What sort of success (or frustration) have you had? I’ll include some exercises on this in the next iteration of my Success course.
A nice tweet thread appeared on “productivity hacks” - not silly things, but solid basic steps that probably benefit everyone. “Get enough sleep, go for a walk, eat well” etc. I know that some colleagues have found themselves falling back into pre-pandemic negative patterns and they’ve wished “give me another lockdown so I can relax” (which rather misses the wood for the trees). You might find some of the listed suggestions are useful to help keep your mind and body in a good shape.
Personally I’ve continued to limit my client engagement days so I have more time for my family - settling my infant in to his new nursery took a fair bit of effort (and lost sleep). Repeated nursery illnesses keep nipping at our health. Building in some down-time to mitigate these effects, to maintain energy levels and happiness just seems to be basic good sense.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
Netacea is an industry-leading provider of bot detection & mitigation capabilities to business struggling with automated threats against their websites, apps and APIs. We ingest and predict on vast quantities of streamed real-time data, sometimes millions of messages per second. As a successful start-up that is now scaling up substantially, having robust and high quality data pipelines is more important than ever. We are looking for an experienced data engineer with a passion for technology and data to help us build a stable and scalable platform.
You will be part of a strong and established data science team, working with another data engineer and with our chief technical architect to research, explore and build our next generation pipelines & processes for handling vast quantities of data and applying our state-of-the-art bot detection capabilities. You will get the opportunity to explore new technologies, face unique challenges, and develop your own skills and experience through training opportunities and collaboration with our other highly skilled delivery teams.
We have open positions for two mid-level data scientists on our team at Netacea. You will be joining a strong and established team of data scientists and data engineers, working on unique problems at a vast scale. You will be building an industry-leading bot detection product, solving new emerging threats for our customers, and developing your own skills and experience through training opportunities and collaboration with our other highly skilled delivery teams.
We also have two Lead Data Scientist roles with one of these specialised towards supporting long-term technical customer relationships. Both Lead roles will be fundamental to the success and growth of the data science function at Netacea. You will be a technical leader, driving quality and innovation in our product, and supporting a highly competent team to deliver revolutionary data science for our customers.
Application links: Lead Data Scientist (Commercial) - https://apply.workable.com/netacea-1/j/4B7ACCC80D/?utm_medium=social_share_link Lead Data Scientist - https://apply.workable.com/j/F3A4E8F82F/?utm_medium=social_share_link Data Scientist - https://apply.workable.com/j/D58EA8DCE2/?utm_medium=social_share_link
Netacea is a Manchester based business providing revolutionary products including website queuing system to prevent traffic to websites that may cause failure and bot management solution that protects websites, mobile apps and APIs from heavy traffic and malicious attacks such as scraping, credential stuffing and account takeover. Netacea was recently categorised by Forrester as a leader in this rapidly expanding market.
We are looking for an outstanding leader to spearhead the growth and development of their data science team. As Head of Data Science, you will lead a department of skilled engineers to deliver outstanding solutions to the most interesting problems in cybersecurity. You will feel comfortable working in an agile way, taking ownership of data science strategy, effectiveness, delivery, and quality. You will grow, nurture, and develop your team and encourage them to explore their full potential. This is a mainly hands-off role, but you should feel confident talking about data science technology with internal and external stakeholders and partners. You will be passionate about data, and understand how it can be used to deliver value to customers.
You will be a core player in the growth of our platform. You will work within one of our platform teams to innovate, collaborate, and iterate in developing solutions to difficult problems. Our teams are autonomous and cross-functional, encompassing every role required to build and improve on our products in whatever way we see best. You will be hands-on working on end-to-end product development cycles from discovery to deployment. This encompasses helping your team discover problems and explore the feasibility and value of potential ML-driven solutions; building prototype solutions and conducting offline and online experiments for validation; collaborating with engineers and product managers on bringing further iterations for those solutions into the products through integration, deployment and scaling.
This particular role will initially be within a team whose responsibilities include effectiveness and efficiency of our labelling processes and tool, training, monitoring and deployment of systems and models for entity linking, text classification and sentiment analysis, among others, across multiple data types. This team also works closely with the operation teams to ensure systems and models are properly maintained.
We’re an audio as a service startup, building an API first solution to add audio to applications. We have customers and we’re fast growing.
As Audio-As-A-Service API-first Voice Tech company our aim is to democratise the way audio is produced. We use AI and “Deepfake for Good” to create beautiful Voice and Audio from simple Text-to-speech - making creating beautiful audio content (from simple text) as easy as writing a blog. Join a 23 people strong international engineering, voice, R&D and business team made out of 13 nationalities (backgrounds include: Ex-University of Edinburgh, PhDs, European Space Agency, SAP, Amazon).
Looking for a data engineer to work on our core data pipelines for our voice-as-a-service and support our team growing. Our stack includes Kubernetes, Python, NodeJS and we use a lot of kubeflow and the serverless stack.
At Vivacity, we make cities smarter. Using Reinforcement Learning techniques at the forefront of academic and research thinking, our award winning teams optimise traffic lights to prioritise cyclists and improve air quality. Our work makes a real difference to real people using ‘privacy by design’ principles.
We’re looking for a confident developer / ML engineer, who is comfortable working in an adaptive setting: get familiar with complex concepts, implement accurately, and communicate your plans effectively with various stakeholders. We’d like to see 1-2 years of industry experience in a relevant field. Our software is in many modern programming languages (Python, Golang, C++ etc) so you will need a willingness to learn. We’d also like to see good capability with Python or Golang.
Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. Built originally in Python for working with NumPy arrays, Zarr is now supported in more than half a dozen languages. With funding from the Chan Zuckerberg Initiative, we are looking to hire a full-time, open-source enthusiast for two years to work as our community manager.
NumFOCUS is seeking a Scientific Software Developer to support the SunPy project. SunPy is a Python-based open source scientific software package supporting solar physics data analysis. Contract is available for U.S. residents only. This is a 1-year contract but work may be completed in less time.
The primary role of the Project Jupyter Community Events Manager will be to manage two event programs: JupyterCon and Jupyter Community Workshops. In conjunction with NumFOCUS and Project Jupyter leadership, you will create and implement a strategy to connect the international Jupyter community through both online and in-person events.
Inawisdom are a Data Science & Machine Learning Consultancy, and AWS Premier Partner. We are looking for mid+ level Python developers with AWS experience (or OO Programmers with AWS who are willing to lean Python, or vice versa!) for a Permanent role. This is an exciting opportunity for someone to make an impact implementing and delivering cloud native solutions and serverless applications in a Data Science business. You will be required to develop software with the latest and greatest tech for high profile, enterprise clients.
• Knowledge of functional and object oriented programming. • Knowledge of synchronous and asynchronous programming. • 2 or more years developing in Python 2.6 or 3.x. • Experience in using Python frameworks (e.g. Flask, Boto 3) • Familiarity with Amazon Web Services (AWS) and REST APIs. • Understanding of databases and SQL. • Understanding of Non-SQL databases. • Experience in unit testing and TTD.
Desirable requirements: • Experience in AWS serverless services (Lambda, API GW, SNS, SQS, and Dynamo DB). • Has developed solutions using AWS SAM or the Serverless Framework and defined APIs in Swagger.
We are looking for an experienced Python Developer with a strong background in Finance to join us as one of our first engineers in the core team.
You will play a key role in designing and maintaining analytics/predictions and visualizations for our new data platform, “Alpha Terminal.” It bundles 2iQ’s data and analytics into one easy-to-use product, offering fundamental investors a range of powerful insights.
Responsibilities: Working with the Quant and Product teams by designing, building and managing critical infrastructure while automating everything with code. Initially, this role will be based in our Lisbon office. However, there is the potential for flexible working arrangements in the future. The role may suit an individual that is looking for a change of scenery or better work-life balance.
Requirements: Experience in a DevOps or software engineering role Strong background with Linux, K8s and Docker (or other container) High proficiency in a language such as Python, Java, or Go
Nice to have Cloud or Big Data experience (Elastic, Aerospike, ClickHouse, KDB+, …) Experience with message buses Spark and/or Dask knowledge
We are seeking highly talented Quantitative Developer with a solid background in Python to join our platform analytics team. In this role, you will help implement, support, and run the hybrid compute infrastructure that manages all research and production workloads.
Working closely with the Quant and Product teams, to support and develop code that is running in our production systems. These systems are the building blocks of the “Alpha Terminal”, a tool for fundamental investors to explore the market. You will also build and optimise data analytics services as well as integrating the data to support the quantitative team. Adapting research prototypes of models to the production environment, is also a key responsibility of this role. This role is to be fulfilled in our Lisbon office. However, flexible working arrangements as well as a hybrid model transition period are available for all candidates.
Requirements: Experience in numerical Python and SQL Working knowledge of Pandas / NumPy libraries Dask and/or Spark knowledge CI/CD knowledge
Nice to have: Docker (or other containerization) knowledge Cloud or Big Data experience (Parquet, PyArrow, Aerospike, ClickHouse, KDB+, …) Knowledge of AI/ML libraries (Tensorflow, PyTorch, SciKit, ..)
Over half a billion videos are watched across millions of websites on a JW Player video player every day. Our product teams leverage data coming from our player to measure success, prioritize our next steps, and envision new possibilities for the thousands of video publishers we serve daily across the web. We iterate quickly, conduct frequent experiments as part of product development, and seek to be data driven in everything we do.
As a Product Analyst on the JW Player Data Science & Product Analytics team, you will work closely with product managers, engineers, and data scientists to develop insights that inform product decisions and strategy. Your findings will impact the next generation of JW Player products, from our flagship video player and video platform to our video recommendations service and other data products. You’ll play a critical role in improving these products and guiding our future development efforts.
JW Player powers billions of video plays every week across a wide spanning web of broadcasters and video publishers with a diverse set of audiences and content types. Leveraging the vast stream of data sent by our flagship player, the Data Science team works in close collaboration with adjacent teams to improve our existing products, drive sound decision making, and develop new data products that bring value to our customers in both the video publishing and video advertising spaces. We iterate quickly, conduct frequent experiments, and seek to be data driven in everything we do.
As a Senior Data Scientist at JW Player, you will be joining a collaborative, creative, multidisciplinary team of scientists, engineers, and data analysts responsible for research and development, product analytics, and running production machine learning models that make tens of millions of predictions every day.
At Rasa we’re hiring for a bunch of engineering roles. We’re a friendly, remote company with many interesting problems to solve. We’re building open-source tools that are used globally to build virtual assistants. Want to invest in developer experience, Non-English NLP and scalable machine learning? Then there’s a lot to do!
Feel free to reach out to Vincent @fishnets88 if you have any questions.
Carbon Re is an AI research and development company dedicated to removing Gigatons of CO2 (equivalent) from humanity’s emissions each year. We aim to do so by optimizing production processes, redesigning manufacturing systems, developing new control processes, and accelerating the development of new climate-friendly materials and systems.Carbon Re is an equal opportunity employer. We are still a small team and are committed to growing in an inclusive manner.
As Principal Data Scientist, your key role will be to establish, define and implement data science solutions in order to deliver business value by making the optimal decisions to ensure efficient and cost-effective performance. You will build data science tools, providing business experts throughout Gas Transmission (GT) with the technology and expertise to unlock and exploit the information we hold to support the effective running of the business.