使用 TFX 命令行界面

TFX 命令行界面 (CLI) 使用管道编排器（如 Kubeflow Pipelines、Vertex Pipelines）执行全面的管道操作。本地编排器也可用于更快的开发或调试。Apache Beam 和 Apache airflow 作为实验性功能得到支持。例如，您可以使用 CLI 来

创建、更新和删除管道。
运行管道并在各种编排器上监控运行。
列出管道和管道运行。

关于 TFX CLI

TFX CLI 作为 TFX 包的一部分安装。所有 CLI 命令都遵循以下结构

tfx command-group command flags

目前支持以下 命令组 选项

tfx pipeline - 创建和管理 TFX 管道。
tfx run - 在各种编排平台上创建和管理 TFX 管道的运行。
tfx template - 用于列出和复制 TFX 管道模板的实验性命令。

每个命令组都提供一组 命令。请按照管道命令、运行命令和模板命令部分中的说明，详细了解如何使用这些命令。

标志允许您将参数传递到 CLI 命令中。标志中的单词用连字符 (-) 或下划线 (_) 分隔。例如，管道名称标志可以指定为 --pipeline-name 或 --pipeline_name。为了简洁起见，本文档使用带下划线的标志。详细了解 TFX CLI 中使用的标志。

tfx pipeline

tfx pipeline 命令组中的命令结构如下

tfx pipeline command required-flags [optional-flags]

请参阅以下部分，详细了解 tfx pipeline 命令组中的命令。

create

在给定的编排器中创建一个新管道。

用法

tfx pipeline create --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \
--iap_client_id=iap-client-id --namespace=namespace \
--build_image --build_base_image=build-base-image]

--pipeline_path=pipeline-path

管道配置文件的路径。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

如果未指定 --endpoint，则使用集群内服务 DNS 名称作为默认值。此名称仅在 CLI 命令在 Kubeflow Pipelines 集群上的 Pod 中执行时有效，例如 Kubeflow Jupyter 笔记本实例。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

** 重要提示：管道配置文件中 DagRunner 所需的编排器必须与所选或自动检测的 engine 匹配。engine 自动检测基于用户环境。如果未安装 Apache Airflow 和 Kubeflow Pipelines，则默认使用本地编排器。

--iap_client_id=iap-client-id

（可选）使用 Kubeflow Pipelines 时，用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

--build_image

（可选）当 engine 为 kubeflow 或 vertex 时，如果指定，TFX 会为您的管道创建容器镜像。将使用当前目录中的 `Dockerfile`，如果不存在，TFX 会自动生成一个。

生成的镜像将被推送到 `KubeflowDagRunnerConfig` 或 `KubeflowV2DagRunnerConfig` 中指定的远程注册表。

--build_base_image=build-base-image

（可选）当 engine 为 kubeflow 时，TFX 会为您的管道创建容器镜像。构建基础镜像指定构建管道容器镜像时要使用的基础容器镜像。

示例

Kubeflow

tfx pipeline create --engine=kubeflow --pipeline_path=pipeline-path \
--iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint \
--build_image

本地

tfx pipeline create --engine=local --pipeline_path=pipeline-path

Vertex

tfx pipeline create --engine=vertex --pipeline_path=pipeline-path \
--build_image

要从用户环境自动检测 engine，只需避免使用 engine 标志，如下面的示例所示。有关更多详细信息，请查看标志部分。

tfx pipeline create --pipeline_path=pipeline-path

更新

更新给定编排器中的现有管道。

用法

tfx pipeline update --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \
--iap_client_id=iap-client-id --namespace=namespace --build_image]

--pipeline_path=pipeline-path

管道配置文件的路径。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

--build_image

（可选）当 engine 为 kubeflow 或 vertex 时，如果指定，TFX 会为您的管道创建容器镜像。将使用当前目录中的 `Dockerfile`。

生成的镜像将被推送到 `KubeflowDagRunnerConfig` 或 `KubeflowV2DagRunnerConfig` 中指定的远程注册表。

示例

Kubeflow

tfx pipeline update --engine=kubeflow --pipeline_path=pipeline-path \
--iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint \
--build_image

本地

tfx pipeline update --engine=local --pipeline_path=pipeline-path

Vertex

tfx pipeline update --engine=vertex --pipeline_path=pipeline-path \
--build_image

编译

编译管道配置文件以在 Kubeflow 中创建工作流文件，并在编译时执行以下检查

检查管道路径是否有效。
检查是否已成功从管道配置文件中提取管道详细信息。
检查管道配置文件中的 DagRunner 是否与 engine 匹配。
检查工作流文件是否已成功创建在提供的包路径中（仅限 Kubeflow）。

建议在创建或更新管道之前使用。

用法

tfx pipeline compile --pipeline_path=pipeline-path [--engine=engine]

--pipeline_path=pipeline-path

管道配置文件的路径。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

示例

Kubeflow

tfx pipeline compile --engine=kubeflow --pipeline_path=pipeline-path

本地

tfx pipeline compile --engine=local --pipeline_path=pipeline-path

Vertex

tfx pipeline compile --engine=vertex --pipeline_path=pipeline-path

删除

从给定编排器中删除管道。

用法

tfx pipeline delete --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \
--iap_client_id=iap-client-id --namespace=namespace]

--pipeline_path=pipeline-path

管道配置文件的路径。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx pipeline delete --engine=kubeflow --pipeline_name=pipeline-name \
--iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint

本地

tfx pipeline delete --engine=local --pipeline_name=pipeline-name

Vertex

tfx pipeline delete --engine=vertex --pipeline_name=pipeline-name

列出

列出给定编排器中的所有管道。

用法

tfx pipeline list [--endpoint=endpoint --engine=engine \
--iap_client_id=iap-client-id --namespace=namespace]

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx pipeline list --engine=kubeflow --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint

本地

tfx pipeline list --engine=local

Vertex

tfx pipeline list --engine=vertex

tfx 运行

tfx run 命令组中命令的结构如下

tfx run command required-flags [optional-flags]

使用以下部分详细了解 tfx run 命令组中的命令。

create

在编排器中为管道创建一个新的运行实例。对于 Kubeflow，将使用集群中管道的最新管道版本。

用法

tfx run create --pipeline_name=pipeline-name [--endpoint=endpoint \
--engine=engine --iap_client_id=iap-client-id --namespace=namespace]

--pipeline_name=pipeline-name

管道的名称。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

--runtime_parameter=parameter-name=parameter-value

（可选）设置运行时参数值。可以设置多次以设置多个变量的值。仅适用于 `airflow`、`kubeflow` 和 `vertex` 引擎。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

--project=GCP-project-id

（Vertex 所需）Vertex 管道的 GCP 项目 ID。

--region=GCP-region

（Vertex 所需）GCP 区域名称，例如 us-central1。有关可用区域，请参阅 [Vertex 文档](https://cloud.google.com/vertex-ai/docs/general/locations)。

示例

Kubeflow

tfx run create --engine=kubeflow --pipeline_name=pipeline-name --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint

本地

tfx run create --engine=local --pipeline_name=pipeline-name

Vertex

tfx run create --engine=vertex --pipeline_name=pipeline-name \
  --runtime_parameter=var_name=var_value \
  --project=gcp-project-id --region=gcp-region

终止

停止给定管道的运行。

** 重要提示：目前仅在 Kubeflow 中受支持。**

用法

tfx run terminate --run_id=run-id [--endpoint=endpoint --engine=engine \
--iap_client_id=iap-client-id --namespace=namespace]

--run_id=run-id

管道运行的唯一标识符。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx run delete --engine=kubeflow --run_id=run-id --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint

列出

列出管道的所有运行。

** 重要提示：目前在本地和 Apache Beam 中不受支持。**

用法

tfx run list --pipeline_name=pipeline-name [--endpoint=endpoint \
--engine=engine --iap_client_id=iap-client-id --namespace=namespace]

--pipeline_name=pipeline-name

管道的名称。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
airflow：（实验性）将 engine 设置为 Apache Airflow

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx run list --engine=kubeflow --pipeline_name=pipeline-name --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint

状态

返回运行的当前状态。

** 重要提示：目前在本地和 Apache Beam 中不受支持。**

用法

tfx run status --pipeline_name=pipeline-name --run_id=run-id [--endpoint=endpoint \
--engine=engine --iap_client_id=iap-client-id --namespace=namespace]

--pipeline_name=pipeline-name

管道的名称。

--run_id=run-id

管道运行的唯一标识符。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
airflow：（实验性）将 engine 设置为 Apache Airflow

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx run status --engine=kubeflow --run_id=run-id --pipeline_name=pipeline-name \
--iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint

删除

删除给定管道的运行。

** 重要提示：目前仅在 Kubeflow 中受支持**

用法

tfx run delete --run_id=run-id [--engine=engine --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint]

--run_id=run-id

管道运行的唯一标识符。

--endpoint=endpoint

(可选) Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--engine=engine

（可选）用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow

如果未设置 engine，则会根据环境自动检测 engine。

--iap_client_id=iap-client-id

（可选）用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

（可选）连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

示例

Kubeflow

tfx run delete --engine=kubeflow --run_id=run-id --iap_client_id=iap-client-id \
--namespace=namespace --endpoint=endpoint

tfx 模板 [实验性]

tfx template 命令组中命令的结构如下

tfx template command required-flags [optional-flags]

使用以下部分详细了解 tfx template 命令组中的命令。模板是一个实验性功能，随时可能更改。

列出

列出可用的 TFX 管道模板。

用法

tfx template list

复制

将模板复制到目标目录。

用法

tfx template copy --model=model --pipeline_name=pipeline-name \
--destination_path=destination-path

--model=model: 管道模板构建的模型的名称。
--pipeline_name=pipeline-name: 管道的名称。
--destination_path=destination-path: 复制模板到的路径。

了解 TFX CLI 标志

通用标志

--engine=engine

用于管道的编排器。engine 的值必须与以下值之一匹配

kubeflow：将 engine 设置为 Kubeflow
local：将 engine 设置为本地编排器
vertex：将 engine 设置为 Vertex Pipelines
airflow：（实验性）将 engine 设置为 Apache Airflow
beam：（实验性）将 engine 设置为 Apache Beam

如果未设置 engine，则会根据环境自动检测 engine。

--pipeline_name=pipeline-name

管道的名称。

--pipeline_path=pipeline-path

管道配置文件的路径。

--run_id=run-id

管道运行的唯一标识符。

Kubeflow 特定标志

--endpoint=endpoint

Kubeflow Pipelines API 服务的端点。Kubeflow Pipelines API 服务的端点与 Kubeflow Pipelines 仪表板的 URL 相同。您的端点值应类似于

https://host-name/pipeline

如果您不知道 Kubeflow Pipelines 集群的端点，请联系您的集群管理员。

--iap_client_id=iap-client-id

用于 IAP 保护的端点的客户端 ID。

--namespace=namespace

连接到 Kubeflow Pipelines API 的 Kubernetes 命名空间。如果未指定命名空间，则该值默认为 kubeflow。

TFX CLI 生成的文件

创建和运行管道时，会生成一些文件用于管道管理。

${HOME}/tfx/local, beam, airflow, vertex
- 从配置中读取的管道元数据存储在 ${HOME}/tfx/${ORCHESTRATION_ENGINE}/${PIPELINE_NAME} 下。此位置可以通过设置环境变量（如 AIRFLOW_HOME 或 KUBEFLOW_HOME）来自定义。此行为可能会在将来的版本中更改。此目录用于存储管道信息，包括 Kubeflow Pipelines 集群中的管道 ID，这些信息是创建运行或更新管道所必需的。
- 在 TFX 0.25 之前，这些文件位于 ${HOME}/${ORCHESTRATION_ENGINE} 下。在 TFX 0.25 中，旧位置中的文件将自动移动到新位置，以确保平滑迁移。
- 从 TFX 0.27 开始，kubeflow 不会在本地文件系统中创建这些元数据文件。但是，请参阅以下 kubeflow 创建的其他文件。
（仅限 Kubeflow）Dockerfile 和容器镜像
- Kubeflow Pipelines 需要两种类型的管道输入。这些文件由 TFX 在当前目录中生成。
- 一个是容器镜像，它将用于运行管道中的组件。当使用 --build-image 标志创建或更新 Kubeflow Pipelines 的管道时，将构建此容器镜像。如果不存在，TFX CLI 将生成 Dockerfile，并将构建并推送到 KubeflowDagRunnerConfig 中指定的注册表。