scikit-learn
NumFOCUS Sponsored Project since 2020Scikit–learn is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation.
Share This Project:
Industry
All Industries
Language
Python, Cython
Features
Data Wrangling, Data Mining, Text Processing, Machine Learning, Modeling
Scikit–learn focuses on effective implementations of widely used machine learning algorithms, with a focus on supervised and unsupervised learning. In particular, it allows practitioners to quickly build complex machine learning pipelines and easily swap out different models. Scikit–learn is built on top of numpy arrays, and therefore focuses on in-memory models of homogeneous data, though some support for out-of-core computations and heterogeneous data exist. Implementations rely either on vectorized computations with numpy, or efficient low-level implementations in Cython.
Scikit–learn is widely used across industry and research. Applications range from finding exoplanets to fraud detection in credit card transactions to analyzing brain imaging data. Scikit–learn is used at tech companies such as Amazon and Microsoft as well as in manufacturing processes in companies like Mars, Inc.