TFDS 现在支持 Croissant 🥐 格式！阅读文档了解更多信息。

TFDS CLI

TFDS CLI 是一款命令行工具，提供各种命令，方便您使用 TensorFlow 数据集。

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

下载笔记本

在导入时禁用 TF 日志

%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

安装

CLI 工具与 tensorflow-datasets（或 tfds-nightly）一起安装。

pip install -q tfds-nightly apache-beam
tfds --version

有关所有 CLI 命令的列表

tfds --help

usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

`tfds new`：实现新的数据集

此命令将帮助您通过创建包含默认实现文件的 <dataset_name>/ 目录来启动编写新的 Python 数据集。

用法

tfds new my_dataset

Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://tensorflowcn.cn/datasets/add_dataset for additional details.

tfds new my_dataset 将创建

ls -1 my_dataset/

CITATIONS.bib
README.md
TAGS.txt
__init__.py
checksums.tsv
dummy_data/
my_dataset_dataset_builder.py
my_dataset_dataset_builder_test.py

可选标志 --data_format 可用于生成特定于格式的数据集构建器（例如，conll）。如果没有提供数据格式，它将生成标准 tfds.core.GeneratorBasedBuilder 的模板。有关可用特定于格式的数据集构建器的详细信息，请参阅文档。

有关更多信息，请参阅我们的编写数据集指南。

可用选项

tfds new --help

usage: tfds new [-h] [--helpfull] [--data_format {standard,conll,conllu}]
                [--dir DIR]
                dataset_name

positional arguments:
  dataset_name          Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --data_format {standard,conll,conllu}
                        Optional format of the input data, which is used to
                        generate a format-specific template.
  --dir DIR             Path where the dataset directory will be created.
                        Defaults to current directory.

`tfds build`：下载并准备数据集

使用 tfds build <my_dataset> 生成新的数据集。 <my_dataset> 可以是

指向 dataset/ 文件夹或 dataset.py 文件的路径（当前目录为空）
- tfds build datasets/my_dataset/
- cd datasets/my_dataset/ && tfds build
- cd datasets/my_dataset/ && tfds build my_dataset
- cd datasets/my_dataset/ && tfds build my_dataset.py
已注册的数据集
- tfds build mnist
- tfds build my_dataset --imports my_project.datasets