Data sets automatically analyzed, annotated, and organized online

Machine learning pipelines automatically shared from many libraries.

Extensive APIs to integrate OpenML into your own tools and scripts

Reproducible results (e.g. models, evaluations) for easy comparison and reuse

Collaborate in real time, right from your existing tools

Make your work more visible, reusable, and easily citable

Open source tools to automate experimentation and model building


OpenML operates on a number of core concepts which are important to understand:

Datasets are pretty straight-forward. They simply consist of a number of rows, also called instances, usually in tabular form.
Example: The iris dataset

A task consists of a dataset, together with a machine learning task to perform, such as classification or clustering and an evaluation method. For supervised tasks, this also specifies the target column in the data.
Example: Classifying different iris species from other attributes and evaluate using 10-fold cross-validation.

A flow identifies a particular machine learning algorithm from a particular library or framework such as Weka, mlr or scikit-learn. It should at least contain a name, details about the workbench and its version and a list of settable hyperparameters. Ideally, the appropriate workbench can deserialize it again (the algorithm, not the model). Example: WEKA's RandomForest

A run is a particular flow, that is algorithm, with a particular parameter setting, applied to a particular task.
Example: Classifying irises with WEKA's RandomForest

How to add instances of Data, Flows, Tasks and Runs is defined in the OpenML definition.


You can upload and download datasets through the website, or APIs. Data hosted elsewhere can be referenced by URL.

Data consists of columns, also known as features or covariates, each of which is either numeric, nominal or a string, and has a unique name. A column can also contain any number of missing values.


Most datasets have a "default target attribute" which denotes the column that is usually the target, also known as dependent variable, in supervised learning tasks. The default target column is denoted by "(target)" in the web interface. Not all datasets have such a column, though, and a supervised task can pick any column as the target (as long as it is of the appropriate type).

Example: The default target variable for the MNIST data is to predict the class from pixel values, OpenML also allows you to create a task that tries to predict the value of pixel257 given all the other pixel values and the class column. As such, the class is also considered a feature in OpenML terminology.

OpenML automatically analyzes the data, checks for problems, visualizes it, and computes 手机怎么连接外国网, also called data qualities (including simple ones like number of features, but also more complex statistics like kurtosis or the AUC of a decision tree of depth 3). These data qualities can be useful to find and compare datasets.

Every dataset gets a dedicated page with all known information (check out zoo), including a wiki, visualizations, statistics, user discussions, and the tasks in which it is used.


OpenML currently only supports uploading of ARFF files. We aim to extend this in the near future, and allow conversions between the main data types.


A dataset can be uniquely identified by its dataset ID, which you can find in the URL of the dataset page, such as 62 for zoo. Each dataset also has a name, but several dataset can have the same name. When several datasets have the same name, they are called "versions" of the same dataset (although that is not necessarily true). The version number is assigned according to the order of upload. Different versions of a dataset can be accessed through the drop down menu at the top right of the dataset page.



Each dataset has a status, which can be "active", "deactivated" or "in_preparation". When you upload a dataset, it will be marked "in_preparation" until it is approved by a site administrator. Once it is approved, the dataset will become "active". If a severe issue has been found with a dataset, it can become "deactivated". By default, the search will only display datasets that are "active", but you can access and download datasets with any status.

Ignored features¶

Features in datasets can be tagged as "ignored" or "row id". Those features will not be considered by programming interfaces, and excluded from any tasks.


Tasks describe what to do with the data. OpenML covers several task types, such as classification and clustering. You can create tasks online.

Tasks are little containers including the data and other information such as train/test splits, and define what needs to be returned.

Tasks are machine-readable so that machine learning environments know what to do, and you can focus on finding the best algorithm. You can run algorithms on your own machine(s) and upload the results. OpenML evaluates and organizes all solutions online.


Tasks are real-time, collaborative data mining challenges (e.g. see this one): you can study, discuss and learn from all submissions (code has to be shared), while OpenML keeps track of who was first.


More concretely, tasks specify the dataset, the kind of machine learning task (i.e. regression), the target attribute (i.e. which column in the dataset should be predicted), the number of splits for cross-validated evaluation and the exact dataset splits, as well as an optional evaluation metric (i.e. mean squared error). Given this specification, a task can be solved using any of the integrated machine learning tools, like Weka, mlr and scikit-learn.


现在有什么比较好的手机加速器,能进外国网站的 - Sogou:2021-7-21 · 一、不能。加速器指的是通过网络线路优化,达到网速上升的效果,并不是说用了之后就可以连接外网的。 二、连接外网需要的是vpn,打开之后选择一个国外线路连接就可以连外网了。 三、另外需要注意的是国家发布了新政策,禁止vpn软件的使用了,所以想找到一个好用的vpn也绝非易事。


Flows are algorithms, workflows, or scripts solving tasks. You can upload them through the website, or 怎样可以上国外网站. Code hosted elsewhere (e.g., GitHub) can be referenced by URL, though typically they are generated automatically by machine learning environments.

快帆安卓app下载_快帆官网APP下载_快帆完整版app官方下载:2 天前 · 快帆app安卓版-快帆下载 手机版 - 河东软件园 2021年8月7日 - 河东软件园为您提供快帆安卓手机版下载,快帆APP是一款专为海外华人开发的网络加速器,让华人用户可以使用APP轻松的连接中国本地的网络,这样就能在世界...

Every flow gets a dedicated page with all known information (check out WEKA's RandomForest), including a wiki, hyperparameters, evaluations on all tasks, and user discussions.



Each flow specifies requirements and dependencies, and you need to install these locally to execute a flow on a specific task. We aim to add support for VMs so that flows can be easily (re)run in any environment.


Runs are applications of flows to a specific task. They are typically submitted automatically by machine learning environments (through the OpenML APIs), with the goal of creating a reproducible experiment (though exactly reproducing experiments across machines might not be possible because of changes in numeric libraries and operating systems).

OpenML organizes all runs online, linked to the underlying data, flows, parameter settings, people, and other details. OpenML also independently evaluates the results contained in the run given the provided predictions.

You can search and compare everyone's runs online, download all results into your favorite machine learning environment, and relate evaluations to known properties of the data and algorithms.


OpenML stores and analyzes results in fine detail, up to the level of individual instances.

!!! Want to read more? A more detailed description can be found in this blogpost.


You can download and inspect all datasets, tasks, flows and runs through the website or the API without creating an account. However, if you want to upload datasets or experiments, you need to create an account or sign in and create an API key. This key can then be used with any of the OpenML APIs.


用什么加速器可以免费上国外网站_好运百科:2021-6-5 · 话题:免费加速器能看外国网站的 问:急啊,要能用的啊 推荐回答:这里有,自己去下载吧,不过不是VPN,功效和VPN是相同的,不用安装设置,永久免费,简单好用。因各地网络情况不一样,软件的使用效果也不一样,请一个一个的试,哪个好用用哪个。



If you want to integrate OpenML into your own tools, we offer several Language-specific APIs, so you can easily interact with OpenML to list, download and upload datasets, tasks, flows and runs.

【TeamViewer怎么样】TeamViewer15.5.6好用吗-ZOL软件下载:2021-6-14 · TeamViewer怎么样?TeamViewer好用吗?ZOL中关村在线软件下载频道点评页为您提供专业点评,为您了解TeamViewer15.5.6提供专业的参考。 TeamViewer是一款能穿透公司内部局域网等各种防火墙的远程控制软件,TeamViewer官网提供在任何防火墙和NAT代理的后台用于远程控制、桌面共享和文件传输的简单且快速的解决方案。

OpenML also offers a 手机怎么连接国外网络 which allows you to talk to OpenML directly.



Datasets, tasks, runs and flows can be assigned tags, either via the web interface or the API. These tags can be used to search and annotated datasets. For example the tag OpenML100 refers to benchmark machine learning algorithms used as a benchmark suite. Anyone can add or remove tags on any entity.


You can combine datasets, flows and runs into studies, to collaborate with others online, or simply keep a log of your work.

Each project gets its own page, which can be linked to publications so that others can find all the details online.

教你使用百度桌面免费玩外服游戏-百度经验:2021-8-29 · 教你使用百度桌面免费玩外服游戏,玩外服其实是想和外国的朋友玩或者是贪新鲜,看看别的国家的游戏和国服是怎么样的,但是连接外服就比较卡,但是不使用代理别说卡,你连游戏都进不去。在这里我推荐大家使用一款叫“百度桌面”的软件。

Circles (under construction)¶

You can create circles of trusted researchers in which data can be shared that is not yet ready for publication.