TFX Estimator 组件教程

TensorFlow Extended (TFX) 的组件介绍

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

下载笔记本

警告： Estimators 不推荐用于新代码。Estimators 运行 v1.Session 样式代码，这种代码更难写正确，并且可能出现意外行为，尤其是在与 TF 2 代码结合使用时。Estimators 确实属于我们的兼容性保证，但除了安全漏洞之外，将不会收到任何修复。有关详细信息，请参阅迁移指南。

本基于 Colab 的教程将以交互方式逐步介绍 TensorFlow Extended (TFX) 的每个内置组件。

它涵盖了端到端机器学习管道中的每个步骤，从数据摄取到将模型推送到服务。

完成后，本笔记本的内容可以自动导出为 TFX 管道源代码，您可以使用 Apache Airflow 和 Apache Beam 来协调这些代码。

背景

本笔记本演示了如何在 Jupyter/Colab 环境中使用 TFX。在这里，我们以交互式笔记本的形式逐步介绍芝加哥出租车示例。

在交互式笔记本中工作是熟悉 TFX 管道结构的一种有用方法。它在开发您自己的管道时作为轻量级开发环境也很有用，但您应该注意，交互式笔记本的协调方式以及它们访问元数据工件的方式存在差异。

协调

在 TFX 的生产部署中，您将使用 Apache Airflow、Kubeflow Pipelines 或 Apache Beam 等协调器来协调预定义的 TFX 组件管道图。在交互式笔记本中，笔记本本身就是协调器，在您执行笔记本单元时运行每个 TFX 组件。

元数据

在 TFX 的生产部署中，您将通过 ML 元数据 (MLMD) API 访问元数据。MLMD 将元数据属性存储在 MySQL 或 SQLite 等数据库中，并将元数据有效负载存储在持久性存储中，例如您的文件系统上。在交互式笔记本中，属性和有效负载都存储在 Jupyter 笔记本或 Colab 服务器上的 /tmp 目录中的一个短暂的 SQLite 数据库中。

设置

首先，我们安装并导入必要的包，设置路径并下载数据。

升级 Pip

为了避免在本地运行时升级系统中的 Pip，请检查我们是否在 Colab 中运行。本地系统当然可以单独升级。

try:
  import colab
  !pip install --upgrade pip
except:
  pass

安装 TFX

pip install tfx

您是否重新启动了运行时？

如果您使用的是 Google Colab，那么第一次运行上面的单元格时，您必须重新启动运行时（运行时 > 重新启动运行时...）。这是因为 Colab 加载包的方式。

导入包

我们导入必要的包，包括标准 TFX 组件类。

import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip

2024-05-08 09:28:45.211472: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-08 09:28:45.211527: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-08 09:28:45.213202: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

让我们检查一下库版本。

print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))

TensorFlow version: 2.15.1
TFX version: 1.15.0

设置管道路径

# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# This is the directory containing the TFX Chicago Taxi Pipeline example.
_taxi_root = os.path.join(_tfx_root, 'examples/chicago_taxi_pipeline')

# This is the path where your model will be pushed for serving.
_serving_model_dir = os.path.join(
    tempfile.mkdtemp(), 'serving_model/taxi_simple')

# Set up logging.
absl.logging.set_verbosity(absl.logging.INFO)

下载示例数据

我们下载示例数据集以供我们的 TFX 管道使用。

我们使用的数据集是芝加哥市发布的出租车行程数据集。此数据集中的列为

pickup_community_area	fare	trip_start_month
trip_start_hour	trip_start_day	trip_start_timestamp
pickup_latitude	pickup_longitude	下车纬度
下车经度	行程里程	上车人口普查区
下车人口普查区	支付方式	公司
行程时长（秒）	下车社区区域	小费

使用此数据集，我们将构建一个模型来预测行程的 小费。

_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

('/tmpfs/tmp/tfx-datafzazc6h4/data.csv',
 <http.client.HTTPMessage at 0x7f393fe02ee0>)

快速浏览一下 CSV 文件。

head {_data_filepath}

pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips
,12.45,5,19,6,1400269500,,,,,0.0,,,Credit Card,Chicago Elite Cab Corp. (Chicago Carriag,0,,0.0
,0,3,19,5,1362683700,,,,,0,,,Unknown,Chicago Elite Cab Corp.,300,,0
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0
16,32.05,12,1,1,1417916700,41.953582125,-87.72345239,,,15.4,,,Cash,,1200,,0.0
30,38.45,10,10,5,1444301100,41.839086906,-87.714003807,,,14.6,,,Cash,,2580,,0.0
11,14.65,1,1,3,1358213400,41.978829526,-87.771166703,,,5.81,,,Cash,,1080,,0.0

免责声明：本网站提供使用从原始来源 www.cityofchicago.org（芝加哥市官方网站）修改后的数据的应用程序。芝加哥市不对本网站提供的任何数据的內容、准确性、及时性或完整性做出任何声明。本网站提供的数据随时可能发生变化。了解到本网站提供的数据的使用风险自负。

创建 InteractiveContext

最后，我们创建一个 InteractiveContext，它将允许我们在该笔记本中以交互方式运行 TFX 组件。

# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
context = InteractiveContext()

WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/metadata.sqlite.

以交互方式运行 TFX 组件

在接下来的单元格中，我们将一个接一个地创建 TFX 组件，运行每个组件，并可视化它们的输出工件。

ExampleGen

The ExampleGen 组件通常位于 TFX 管道的开头。它将

将数据分成训练集和评估集（默认情况下，2/3 训练 + 1/3 评估）
将数据转换为 tf.Example 格式（了解更多信息此处）
将数据复制到 _tfx_root 目录中，以便其他组件可以访问

ExampleGen 将数据源路径作为输入。在我们的例子中，这是包含已下载 CSV 的 _data_root 路径。

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
context.run(example_gen)

INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Processing input csv data /tmpfs/tmp/tfx-datafzazc6h4/* to TFExample.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized

让我们检查一下 ExampleGen 的输出工件。该组件生成两个工件，训练示例和评估示例

artifact = example_gen.outputs['examples'].get()[0]
print(artifact.split_names, artifact.uri)

["train", "eval"] /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/CsvExampleGen/examples/1

我们还可以查看前三个训练示例

# Get the URI of the output artifact representing the training examples, which is a directory
train_uri = os.path.join(example_gen.outputs['examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)

features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Chicago Elite Cab Corp. (Chicago Carriag"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 12.449999809265137
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Credit Card"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1400269500
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Taxi Affiliation Services"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 27.049999237060547
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.836151123046875
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.64878845214844
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 12.600000381469727
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 1380
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1380593700
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 16.450000762939453
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.98363494873047
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.72357940673828
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 6.900000095367432
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 780
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1446554700
      }
    }
  }
}

现在 ExampleGen 已完成数据摄取，下一步是数据分析。

StatisticsGen

The StatisticsGen 组件计算数据集上的统计信息，用于数据分析，以及在后续组件中使用。它使用 TensorFlow 数据验证库。

StatisticsGen 将我们刚刚使用 ExampleGen 摄取的数据集作为输入。

statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/StatisticsGen/statistics/2/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/StatisticsGen/statistics/2/Split-eval.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

在 StatisticsGen 完成运行后，我们可以可视化输出的统计信息。尝试使用不同的绘图！

context.show(statistics_gen.outputs['statistics'])

SchemaGen

The SchemaGen 组件根据您的数据统计信息生成一个模式。（模式定义了数据集中的特征的预期边界、类型和属性。）它还使用 TensorFlow 数据验证库。

SchemaGen 将使用我们使用 StatisticsGen 生成的统计信息作为输入，默认情况下查看训练拆分。

schema_gen = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(schema_gen)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

在 SchemaGen 完成运行后，我们可以将生成的模式可视化为表格。

context.show(schema_gen.outputs['schema'])

数据集中的每个特征都显示为模式表中的一行，以及其属性。模式还捕获分类特征所采用的所有值，表示为其域。

要了解有关模式的更多信息，请参见 SchemaGen 文档。

ExampleValidator

The ExampleValidator 组件根据模式定义的期望检测数据中的异常。它还使用 TensorFlow 数据验证库。

ExampleValidator 将使用来自 StatisticsGen 的统计信息以及来自 SchemaGen 的模式作为输入。

example_validator = tfx.components.ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'])
context.run(example_validator)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Anomalies alerts created for split train.
INFO:absl:Validation complete for split train. Anomalies written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/ExampleValidator/anomalies/4/Split-train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Anomalies alerts created for split eval.
INFO:absl:Validation complete for split eval. Anomalies written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/ExampleValidator/anomalies/4/Split-eval.
INFO:absl:Running publisher for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized

在 ExampleValidator 完成运行后，我们可以将异常可视化为表格。

context.show(example_validator.outputs['anomalies'])

在异常表中，我们可以看到没有异常。这是我们所期望的，因为这是我们分析的第一个数据集，并且模式是根据它定制的。您应该审查此模式 - 任何意外情况都意味着数据中的异常。审查后，模式可用于保护未来的数据，并且此处产生的异常可用于调试模型性能，了解数据如何随时间推移而演变，以及识别数据错误。

转换

The Transform 组件对训练和服务执行特征工程。它使用 TensorFlow Transform 库。

Transform 将使用来自 ExampleGen 的数据、来自 SchemaGen 的模式以及包含用户定义的 Transform 代码的模块作为输入。

让我们在下面查看用户定义的 Transform 代码的示例（有关 TensorFlow Transform API 的介绍，请参阅教程）。首先，我们定义一些用于特征工程的常量

_taxi_constants_module_file = 'taxi_constants.py'

%%writefile {_taxi_constants_module_file}

# Categorical features are assumed to each have a maximum value in the dataset.
MAX_CATEGORICAL_FEATURE_VALUES = [24, 31, 12]

CATEGORICAL_FEATURE_KEYS = [
    'trip_start_hour', 'trip_start_day', 'trip_start_month',
    'pickup_census_tract', 'dropoff_census_tract', 'pickup_community_area',
    'dropoff_community_area'
]

DENSE_FLOAT_FEATURE_KEYS = ['trip_miles', 'fare', 'trip_seconds']

# Number of buckets used by tf.transform for encoding each feature.
FEATURE_BUCKET_COUNT = 10

BUCKET_FEATURE_KEYS = [
    'pickup_latitude', 'pickup_longitude', 'dropoff_latitude',
    'dropoff_longitude'
]

# Number of vocabulary terms used for encoding VOCAB_FEATURES by tf.transform
VOCAB_SIZE = 1000

# Count of out-of-vocab buckets in which unrecognized VOCAB_FEATURES are hashed.
OOV_SIZE = 10

VOCAB_FEATURE_KEYS = [
    'payment_type',
    'company',
]

# Keys
LABEL_KEY = 'tips'
FARE_KEY = 'fare'

Writing taxi_constants.py

接下来，我们编写一个 preprocessing_fn，它将原始数据作为输入，并返回模型可以训练的转换后的特征

_taxi_transform_module_file = 'taxi_transform.py'

%%writefile {_taxi_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_FARE_KEY = taxi_constants.FARE_KEY
_LABEL_KEY = taxi_constants.LABEL_KEY


def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  outputs = {}
  for key in _DENSE_FLOAT_FEATURE_KEYS:
    # If sparse make it dense, setting nan's to 0 or '', and apply zscore.
    outputs[key] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  for key in _VOCAB_FEATURE_KEYS:
    # Build a vocabulary for this feature.
    outputs[key] = tft.compute_and_apply_vocabulary(
        _fill_in_missing(inputs[key]),
        top_k=_VOCAB_SIZE,
        num_oov_buckets=_OOV_SIZE)

  for key in _BUCKET_FEATURE_KEYS:
    outputs[key] = tft.bucketize(
        _fill_in_missing(inputs[key]), _FEATURE_BUCKET_COUNT)

  for key in _CATEGORICAL_FEATURE_KEYS:
    outputs[key] = _fill_in_missing(inputs[key])

  # Was this passenger a big tipper?
  taxi_fare = _fill_in_missing(inputs[_FARE_KEY])
  tips = _fill_in_missing(inputs[_LABEL_KEY])
  outputs[_LABEL_KEY] = tf.where(
      tf.math.is_nan(taxi_fare),
      tf.cast(tf.zeros_like(taxi_fare), tf.int64),
      # Test if the tip was > 20% of the fare.
      tf.cast(
          tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))

  return outputs


def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
      in the second dimension.
  Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
  """
  if not isinstance(x, tf.sparse.SparseTensor):
    return x

  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)

Writing taxi_transform.py

现在，我们将此特征工程代码传递给 Transform 组件并运行它以转换您的数据。

transform = tfx.components.Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=os.path.abspath(_taxi_transform_module_file))
context.run(transform)

INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_transform.py' (including modules: ['taxi_transform', 'taxi_constants']).
INFO:absl:User module package has hash fingerprint version f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpagxnedeh/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmpmro2o5jq', '--dist-dir', '/tmpfs/tmp/tmpqxoax40b']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'; target user module is 'taxi_transform'.
INFO:absl:Full user module path is 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_transform.py -> build/lib
copying taxi_constants.py -> build/lib
installing to /tmpfs/tmp/tmpmro2o5jq
running install
running install_lib
copying build/lib/taxi_transform.py -> /tmpfs/tmp/tmpmro2o5jq
copying build/lib/taxi_constants.py -> /tmpfs/tmp/tmpmro2o5jq
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmpfs/tmp/tmpmro2o5jq/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmpmro2o5jq/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL
creating '/tmpfs/tmp/tmpqxoax40b/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' and adding '/tmpfs/tmp/tmpmro2o5jq' to it
adding 'taxi_constants.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/RECORD'
removing /tmpfs/tmp/tmpmro2o5jq
INFO:absl:Running executor for Transform
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5nkxki37', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5jgr4gdg', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmppif0r6r3', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/27474b1a835b4a51961d9ff1455e2a16/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/27474b1a835b4a51961d9ff1455e2a16/fingerprint.pb
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/ff049abe3d52450a806a6fe8069cd573/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/ff049abe3d52450a806a6fe8069cd573/fingerprint.pb
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

让我们检查一下 Transform 的输出工件。该组件生成两种类型的输出

transform_graph 是可以执行预处理操作的图（此图将包含在服务和评估模型中）。
transformed_examples 表示预处理的训练和评估数据。

transform.outputs

{'transform_graph': OutputChannel(artifact_type=TransformGraph, producer_component_id=Transform, output_key=transform_graph, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'transformed_examples': OutputChannel(artifact_type=Examples, producer_component_id=Transform, output_key=transformed_examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'updated_analyzer_cache': OutputChannel(artifact_type=TransformCache, producer_component_id=Transform, output_key=updated_analyzer_cache, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'pre_transform_schema': OutputChannel(artifact_type=Schema, producer_component_id=Transform, output_key=pre_transform_schema, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'pre_transform_stats': OutputChannel(artifact_type=ExampleStatistics, producer_component_id=Transform, output_key=pre_transform_stats, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_schema': OutputChannel(artifact_type=Schema, producer_component_id=Transform, output_key=post_transform_schema, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_stats': OutputChannel(artifact_type=ExampleStatistics, producer_component_id=Transform, output_key=post_transform_stats, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_anomalies': OutputChannel(artifact_type=ExampleAnomalies, producer_component_id=Transform, output_key=post_transform_anomalies, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

看一下 transform_graph 工件。它指向包含三个子目录的目录。

train_uri = transform.outputs['transform_graph'].get()[0].uri
os.listdir(train_uri)

['transform_fn', 'metadata', 'transformed_metadata']

The transformed_metadata 子目录包含预处理数据的模式。The transform_fn 子目录包含实际的预处理图。The metadata 子目录包含原始数据的模式。

我们还可以查看前三个转换后的示例

# Get the URI of the output artifact representing the transformed examples, which is a directory
train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 8
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.061060599982738495
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: -0.15886740386486053
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: -0.7118487358093262
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 1.2521240711212158
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.532160758972168
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.5509493350982666
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 48
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.3873794376850128
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.21955278515815735
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.0019067146349698305
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
}

在 Transform 组件将您的数据转换为特征后，下一步是训练模型。

Trainer

The Trainer 组件将训练您在 TensorFlow 中定义的模型（使用 Estimator API 或使用 model_to_estimator 的 Keras API）。

Trainer 将使用来自 SchemaGen 的模式、来自 Transform 的转换后的数据和图、训练参数以及包含用户定义的模型代码的模块作为输入。

让我们在下面查看用户定义的模型代码的示例（有关 TensorFlow Estimator API 的介绍，请参阅教程）

_taxi_trainer_module_file = 'taxi_trainer.py'

%%writefile {_taxi_trainer_module_file}

import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils
from tfx_bsl.tfxio import dataset_options

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_MAX_CATEGORICAL_FEATURE_VALUES = taxi_constants.MAX_CATEGORICAL_FEATURE_VALUES
_LABEL_KEY = taxi_constants.LABEL_KEY


# Tf.Transform considers these features as "raw"
def _get_raw_feature_spec(schema):
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _build_estimator(config, hidden_units=None, warm_start_from=None):
  """Build an estimator for predicting the tipping behavior of taxi riders.
  Args:
    config: tf.estimator.RunConfig defining the runtime environment for the
      estimator (including model_dir).
    hidden_units: [int], the layer sizes of the DNN (input layer first)
    warm_start_from: Optional directory to warm start from.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  real_valued_columns = [
      tf.feature_column.numeric_column(key, shape=())
      for key in _DENSE_FLOAT_FEATURE_KEYS
  ]
  categorical_columns = [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_VOCAB_SIZE + _OOV_SIZE, default_value=0)
      for key in _VOCAB_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_FEATURE_BUCKET_COUNT, default_value=0)
      for key in _BUCKET_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(  # pylint: disable=g-complex-comprehension
          key,
          num_buckets=num_buckets,
          default_value=0) for key, num_buckets in zip(
              _CATEGORICAL_FEATURE_KEYS,
              _MAX_CATEGORICAL_FEATURE_VALUES)
  ]
  return tf.estimator.DNNLinearCombinedClassifier(
      config=config,
      linear_feature_columns=categorical_columns,
      dnn_feature_columns=real_valued_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25],
      warm_start_from=warm_start_from)


def _example_serving_receiver_fn(tf_transform_graph, schema):
  """Build the serving in inputs.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    Tensorflow graph which parses examples, applying tf-transform to them.
  """
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(_LABEL_KEY)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec, default_batch_size=None)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_graph.transform_raw_features(
      serving_input_receiver.features)

  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_graph, schema):
  """Build everything needed for the tf-model-analysis to run the model.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    EvalInputReceiver function, which contains:
      - Tensorflow graph which parses raw untransformed features, applies the
        tf-transform preprocessing operators.
      - Set of raw, untransformed features.
      - Label against which predictions will be compared.
  """
  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_tensor')

  # Add a parse_example operator to the tensorflow graph, which will parse
  # raw, untransformed, tf examples.
  features = tf.io.parse_example(serialized_tf_example, raw_feature_spec)

  # Now that we have our raw examples, process them through the tf-transform
  # function computed during the preprocessing step.
  transformed_features = tf_transform_graph.transform_raw_features(
      features)

  # The key name MUST be 'examples'.
  receiver_tensors = {'examples': serialized_tf_example}

  # NOTE: Model is driven by transformed features (since training works on the
  # materialized output of TFT, but slicing will happen on raw features.
  features.update(transformed_features)

  return tfma.export.EvalInputReceiver(
      features=features,
      receiver_tensors=receiver_tensors,
      labels=transformed_features[_LABEL_KEY])


def _input_fn(file_pattern, data_accessor, tf_transform_output, batch_size=200):
  """Generates features and label for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      dataset_options.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY),
      tf_transform_output.transformed_metadata.schema)


# TFX will call this function
def trainer_fn(trainer_fn_args, schema):
  """Build the estimator using the high level API.
  Args:
    trainer_fn_args: Holds args used to train the model as name/value pairs.
    schema: Holds the schema of the training examples.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  # Number of nodes in the first layer of the DNN
  first_dnn_layer_size = 100
  num_dnn_layers = 4
  dnn_decay_factor = 0.7

  train_batch_size = 40
  eval_batch_size = 40

  tf_transform_graph = tft.TFTransformOutput(trainer_fn_args.transform_output)

  train_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.train_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=train_batch_size)

  eval_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.eval_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=eval_batch_size)

  train_spec = tf.estimator.TrainSpec(  # pylint: disable=g-long-lambda
      train_input_fn,
      max_steps=trainer_fn_args.train_steps)

  serving_receiver_fn = lambda: _example_serving_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  exporter = tf.estimator.FinalExporter('chicago-taxi', serving_receiver_fn)
  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=trainer_fn_args.eval_steps,
      exporters=[exporter],
      name='chicago-taxi-eval')

  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=999, keep_checkpoint_max=1)

  run_config = run_config.replace(model_dir=trainer_fn_args.serving_model_dir)

  estimator = _build_estimator(
      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
          for i in range(num_dnn_layers)
      ],
      config=run_config,
      warm_start_from=trainer_fn_args.base_model)

  # Create an input receiver for TFMA processing
  receiver_fn = lambda: _eval_input_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }

Writing taxi_trainer.py

现在，我们将此模型代码传递给 Trainer 组件并运行它以训练模型。

from tfx.components.trainer.executor import Executor
from tfx.dsl.components.base import executor_spec

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_taxi_trainer_module_file),
    custom_executor_spec=executor_spec.ExecutorClassSpec(Executor),
    examples=transform.outputs['transformed_examples'],
    schema=schema_gen.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=tfx.proto.TrainArgs(num_steps=10000),
    eval_args=tfx.proto.EvalArgs(num_steps=5000))
context.run(trainer)

WARNING:absl:`custom_executor_spec` is deprecated. Please customize component directly.
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py' (including modules: ['taxi_trainer', 'taxi_transform', 'taxi_constants']).
INFO:absl:User module package has hash fingerprint version e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpnta9yff_/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmp5wdb8vq6', '--dist-dir', '/tmpfs/tmp/tmp8v8hc7ie']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'; target user module is 'taxi_trainer'.
INFO:absl:Full user module path is 'taxi_trainer@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_trainer.py -> build/lib
copying taxi_transform.py -> build/lib
copying taxi_constants.py -> build/lib
installing to /tmpfs/tmp/tmp5wdb8vq6
running install
running install_lib
copying build/lib/taxi_trainer.py -> /tmpfs/tmp/tmp5wdb8vq6
copying build/lib/taxi_transform.py -> /tmpfs/tmp/tmp5wdb8vq6
copying build/lib/taxi_constants.py -> /tmpfs/tmp/tmp5wdb8vq6
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmpfs/tmp/tmp5wdb8vq6/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmp5wdb8vq6/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL
creating '/tmpfs/tmp/tmp8v8hc7ie/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' and adding '/tmpfs/tmp/tmp5wdb8vq6' to it
adding 'taxi_constants.py'
adding 'taxi_trainer.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/RECORD'
removing /tmpfs/tmp/tmp5wdb8vq6
INFO:absl:Running executor for Trainer
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
INFO:absl:udf_utils.get_fn {'train_args': '{\n  "num_steps": 10000\n}', 'eval_args': '{\n  "num_steps": 5000\n}', 'module_file': None, 'run_fn': None, 'trainer_fn': None, 'custom_config': 'null', 'module_path': 'taxi_trainer@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'} 'trainer_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp86irddos', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'.
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:188: TrainSpec.__new__ (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:195: FinalExporter.__init__ (from tensorflow_estimator.python.estimator.exporter) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:196: EvalSpec.__new__ (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:202: RunConfig.__init__ (from tensorflow_estimator.python.estimator.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:41: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:45: categorical_column_with_identity (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:62: DNNLinearCombinedClassifierV2.__init__ (from tensorflow_estimator.python.estimator.canned.dnn_linear_combined) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/head_utils.py:54: BinaryClassHead.__init__ (from tensorflow_estimator.python.estimator.head.binary_class_head) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/dnn_linear_combined.py:586: Estimator.__init__ (from tensorflow_estimator.python.estimator.estimator) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:absl:Training model.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx/components/trainer/executor.py:270: train_and_evaluate (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:385: StopAtStepHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx_bsl/tfxio/tf_example_record.py:343: parse_example_dataset (from tensorflow.python.data.experimental.ops.parsing_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(tf.io.parse_example(...))` instead.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/optimizers/legacy/adagrad.py:93: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/model_fn.py:250: EstimatorSpec.__new__ (from tensorflow_estimator.python.estimator.model_fn) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1416: NanTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1419: LoggingTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py:232: SecondOrStepTimer.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1456: CheckpointSaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Create CheckpointSaverHook.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:579: StepCounterHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:586: SummarySaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2024-05-08 09:29:41.217001: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

    for Tuple type infernce function 0
    while inferring type of node 'dnn/zero_fraction/cond/output/_18'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1455: SessionRunArgs.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1454: SessionRunContext.__init__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1474: SessionRunValues.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:loss = 0.7004041, step = 0
INFO:tensorflow:global_step/sec: 94.9559
INFO:tensorflow:loss = 0.58275115, step = 100 (1.054 sec)
INFO:tensorflow:global_step/sec: 126.379
INFO:tensorflow:loss = 0.5188283, step = 200 (0.791 sec)
INFO:tensorflow:global_step/sec: 127.047
INFO:tensorflow:loss = 0.5207444, step = 300 (0.787 sec)
INFO:tensorflow:global_step/sec: 126.868
INFO:tensorflow:loss = 0.5565426, step = 400 (0.788 sec)
INFO:tensorflow:global_step/sec: 128.906
INFO:tensorflow:loss = 0.44116384, step = 500 (0.776 sec)
INFO:tensorflow:global_step/sec: 127.421
INFO:tensorflow:loss = 0.4577001, step = 600 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.514
INFO:tensorflow:loss = 0.4446908, step = 700 (0.784 sec)
INFO:tensorflow:global_step/sec: 125.617
INFO:tensorflow:loss = 0.4720246, step = 800 (0.796 sec)
INFO:tensorflow:global_step/sec: 131.021
INFO:tensorflow:loss = 0.4437019, step = 900 (0.763 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 999...
INFO:tensorflow:Saving checkpoints for 999 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/saver.py:1067: remove_checkpoint (from tensorflow.python.checkpoint.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 999...
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2024-05-08T09:29:55
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/evaluation.py:260: FinalOpsHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 31.08471s
INFO:tensorflow:Finished evaluation at 2024-05-08-09:30:26
INFO:tensorflow:Saving dict for global step 999: accuracy = 0.771235, accuracy_baseline = 0.771235, auc = 0.9186248, auc_precision_recall = 0.6492264, average_loss = 0.4624323, global_step = 999, label/mean = 0.228765, loss = 0.46243173, precision = 0.0, prediction/mean = 0.24984238, recall = 0.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:global_step/sec: 2.90748
INFO:tensorflow:loss = 0.42686376, step = 1000 (34.394 sec)
INFO:tensorflow:global_step/sec: 126.964
INFO:tensorflow:loss = 0.44425964, step = 1100 (0.788 sec)
INFO:tensorflow:global_step/sec: 127.535
INFO:tensorflow:loss = 0.47197676, step = 1200 (0.784 sec)
INFO:tensorflow:global_step/sec: 125.471
INFO:tensorflow:loss = 0.506038, step = 1300 (0.797 sec)
INFO:tensorflow:global_step/sec: 127.967
INFO:tensorflow:loss = 0.4099118, step = 1400 (0.781 sec)
INFO:tensorflow:global_step/sec: 126.812
INFO:tensorflow:loss = 0.5171037, step = 1500 (0.789 sec)
INFO:tensorflow:global_step/sec: 131.046
INFO:tensorflow:loss = 0.4371317, step = 1600 (0.763 sec)
INFO:tensorflow:global_step/sec: 128.146
INFO:tensorflow:loss = 0.45776543, step = 1700 (0.780 sec)
INFO:tensorflow:global_step/sec: 128.92
INFO:tensorflow:loss = 0.50659925, step = 1800 (0.776 sec)
INFO:tensorflow:global_step/sec: 129.06
INFO:tensorflow:loss = 0.43417373, step = 1900 (0.775 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1998...
INFO:tensorflow:Saving checkpoints for 1998 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1998...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 103.772
INFO:tensorflow:loss = 0.45863923, step = 2000 (0.963 sec)
INFO:tensorflow:global_step/sec: 125.899
INFO:tensorflow:loss = 0.3514557, step = 2100 (0.795 sec)
INFO:tensorflow:global_step/sec: 129.752
INFO:tensorflow:loss = 0.43468484, step = 2200 (0.771 sec)
INFO:tensorflow:global_step/sec: 131.014
INFO:tensorflow:loss = 0.48132992, step = 2300 (0.763 sec)
INFO:tensorflow:global_step/sec: 130.271
INFO:tensorflow:loss = 0.44048753, step = 2400 (0.768 sec)
INFO:tensorflow:global_step/sec: 129.449
INFO:tensorflow:loss = 0.3523005, step = 2500 (0.773 sec)
INFO:tensorflow:global_step/sec: 130.936
INFO:tensorflow:loss = 0.3773502, step = 2600 (0.764 sec)
INFO:tensorflow:global_step/sec: 129.258
INFO:tensorflow:loss = 0.43350023, step = 2700 (0.774 sec)
INFO:tensorflow:global_step/sec: 133.75
INFO:tensorflow:loss = 0.37304792, step = 2800 (0.748 sec)
INFO:tensorflow:global_step/sec: 130.275
INFO:tensorflow:loss = 0.3801176, step = 2900 (0.768 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2997...
INFO:tensorflow:Saving checkpoints for 2997 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2997...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 107.092
INFO:tensorflow:loss = 0.3836586, step = 3000 (0.933 sec)
INFO:tensorflow:global_step/sec: 133.249
INFO:tensorflow:loss = 0.43525982, step = 3100 (0.751 sec)
INFO:tensorflow:global_step/sec: 124.615
INFO:tensorflow:loss = 0.42075485, step = 3200 (0.803 sec)
INFO:tensorflow:global_step/sec: 122.737
INFO:tensorflow:loss = 0.3901537, step = 3300 (0.815 sec)
INFO:tensorflow:global_step/sec: 122.54
INFO:tensorflow:loss = 0.35952353, step = 3400 (0.816 sec)
INFO:tensorflow:global_step/sec: 124.721
INFO:tensorflow:loss = 0.3873772, step = 3500 (0.802 sec)
INFO:tensorflow:global_step/sec: 128.574
INFO:tensorflow:loss = 0.36566123, step = 3600 (0.778 sec)
INFO:tensorflow:global_step/sec: 126.009
INFO:tensorflow:loss = 0.40229043, step = 3700 (0.794 sec)
INFO:tensorflow:global_step/sec: 122.1
INFO:tensorflow:loss = 0.4070228, step = 3800 (0.819 sec)
INFO:tensorflow:global_step/sec: 123.903
INFO:tensorflow:loss = 0.4688112, step = 3900 (0.807 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3996...
INFO:tensorflow:Saving checkpoints for 3996 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3996...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 100.776
INFO:tensorflow:loss = 0.49602365, step = 4000 (0.992 sec)
INFO:tensorflow:global_step/sec: 123.848
INFO:tensorflow:loss = 0.2742646, step = 4100 (0.808 sec)
INFO:tensorflow:global_step/sec: 123.423
INFO:tensorflow:loss = 0.44800407, step = 4200 (0.810 sec)
INFO:tensorflow:global_step/sec: 123.175
INFO:tensorflow:loss = 0.43835735, step = 4300 (0.812 sec)
INFO:tensorflow:global_step/sec: 123.891
INFO:tensorflow:loss = 0.32207388, step = 4400 (0.807 sec)
INFO:tensorflow:global_step/sec: 125.024
INFO:tensorflow:loss = 0.38216084, step = 4500 (0.800 sec)
INFO:tensorflow:global_step/sec: 124.891
INFO:tensorflow:loss = 0.45092455, step = 4600 (0.801 sec)
INFO:tensorflow:global_step/sec: 123.988
INFO:tensorflow:loss = 0.3750969, step = 4700 (0.807 sec)
INFO:tensorflow:global_step/sec: 125.77
INFO:tensorflow:loss = 0.3925597, step = 4800 (0.795 sec)
INFO:tensorflow:global_step/sec: 125.459
INFO:tensorflow:loss = 0.43026057, step = 4900 (0.797 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4995...
INFO:tensorflow:Saving checkpoints for 4995 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4995...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 103.654
INFO:tensorflow:loss = 0.41449785, step = 5000 (0.964 sec)
INFO:tensorflow:global_step/sec: 125.195
INFO:tensorflow:loss = 0.3261056, step = 5100 (0.799 sec)
INFO:tensorflow:global_step/sec: 127.462
INFO:tensorflow:loss = 0.35417694, step = 5200 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.844
INFO:tensorflow:loss = 0.4030676, step = 5300 (0.782 sec)
INFO:tensorflow:global_step/sec: 130.923
INFO:tensorflow:loss = 0.4126954, step = 5400 (0.764 sec)
INFO:tensorflow:global_step/sec: 130.98
INFO:tensorflow:loss = 0.32259554, step = 5500 (0.764 sec)
INFO:tensorflow:global_step/sec: 128.589
INFO:tensorflow:loss = 0.3811575, step = 5600 (0.777 sec)
INFO:tensorflow:global_step/sec: 125.918
INFO:tensorflow:loss = 0.40286702, step = 5700 (0.794 sec)
INFO:tensorflow:global_step/sec: 129.137
INFO:tensorflow:loss = 0.32921094, step = 5800 (0.775 sec)
INFO:tensorflow:global_step/sec: 130.104
INFO:tensorflow:loss = 0.4013093, step = 5900 (0.768 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5994...
INFO:tensorflow:Saving checkpoints for 5994 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5994...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 106.58
INFO:tensorflow:loss = 0.31678432, step = 6000 (0.938 sec)
INFO:tensorflow:global_step/sec: 126.52
INFO:tensorflow:loss = 0.3150363, step = 6100 (0.791 sec)
INFO:tensorflow:global_step/sec: 128.723
INFO:tensorflow:loss = 0.39068645, step = 6200 (0.777 sec)
INFO:tensorflow:global_step/sec: 128.852
INFO:tensorflow:loss = 0.2832807, step = 6300 (0.776 sec)
INFO:tensorflow:global_step/sec: 125.866
INFO:tensorflow:loss = 0.32048258, step = 6400 (0.794 sec)
INFO:tensorflow:global_step/sec: 125.299
INFO:tensorflow:loss = 0.38626963, step = 6500 (0.798 sec)
INFO:tensorflow:global_step/sec: 125.342
INFO:tensorflow:loss = 0.39416704, step = 6600 (0.798 sec)
INFO:tensorflow:global_step/sec: 127.235
INFO:tensorflow:loss = 0.30232263, step = 6700 (0.786 sec)
INFO:tensorflow:global_step/sec: 126.055
INFO:tensorflow:loss = 0.41977397, step = 6800 (0.793 sec)
INFO:tensorflow:global_step/sec: 127.477
INFO:tensorflow:loss = 0.47491065, step = 6900 (0.784 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 6993...
INFO:tensorflow:Saving checkpoints for 6993 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 6993...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 104.669
INFO:tensorflow:loss = 0.35919297, step = 7000 (0.955 sec)
INFO:tensorflow:global_step/sec: 126.032
INFO:tensorflow:loss = 0.42433387, step = 7100 (0.794 sec)
INFO:tensorflow:global_step/sec: 126.079
INFO:tensorflow:loss = 0.3359905, step = 7200 (0.793 sec)
INFO:tensorflow:global_step/sec: 124.834
INFO:tensorflow:loss = 0.4118205, step = 7300 (0.801 sec)
INFO:tensorflow:global_step/sec: 126.048
INFO:tensorflow:loss = 0.3594822, step = 7400 (0.793 sec)
INFO:tensorflow:global_step/sec: 127.316
INFO:tensorflow:loss = 0.3544901, step = 7500 (0.786 sec)
INFO:tensorflow:global_step/sec: 125.922
INFO:tensorflow:loss = 0.3517708, step = 7600 (0.794 sec)
INFO:tensorflow:global_step/sec: 127.473
INFO:tensorflow:loss = 0.32316074, step = 7700 (0.784 sec)
INFO:tensorflow:global_step/sec: 127.602
INFO:tensorflow:loss = 0.28583208, step = 7800 (0.784 sec)
INFO:tensorflow:global_step/sec: 128.23
INFO:tensorflow:loss = 0.379911, step = 7900 (0.780 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 7992...
INFO:tensorflow:Saving checkpoints for 7992 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 7992...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 105.424
INFO:tensorflow:loss = 0.3968008, step = 8000 (0.948 sec)
INFO:tensorflow:global_step/sec: 128.249
INFO:tensorflow:loss = 0.43308416, step = 8100 (0.780 sec)
INFO:tensorflow:global_step/sec: 128.472
INFO:tensorflow:loss = 0.42253828, step = 8200 (0.778 sec)
INFO:tensorflow:global_step/sec: 125.642
INFO:tensorflow:loss = 0.39132017, step = 8300 (0.796 sec)
INFO:tensorflow:global_step/sec: 128.607
INFO:tensorflow:loss = 0.30107036, step = 8400 (0.777 sec)
INFO:tensorflow:global_step/sec: 126.434
INFO:tensorflow:loss = 0.30194753, step = 8500 (0.791 sec)
INFO:tensorflow:global_step/sec: 127.391
INFO:tensorflow:loss = 0.30165237, step = 8600 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.042
INFO:tensorflow:loss = 0.44196972, step = 8700 (0.787 sec)
INFO:tensorflow:global_step/sec: 126.923
INFO:tensorflow:loss = 0.42164555, step = 8800 (0.788 sec)
INFO:tensorflow:global_step/sec: 127.39
INFO:tensorflow:loss = 0.3490799, step = 8900 (0.785 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 8991...
INFO:tensorflow:Saving checkpoints for 8991 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 8991...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 105.59
INFO:tensorflow:loss = 0.31310123, step = 9000 (0.947 sec)
INFO:tensorflow:global_step/sec: 124.933
INFO:tensorflow:loss = 0.4325568, step = 9100 (0.801 sec)
INFO:tensorflow:global_step/sec: 127.59
INFO:tensorflow:loss = 0.30360752, step = 9200 (0.784 sec)
INFO:tensorflow:global_step/sec: 130.138
INFO:tensorflow:loss = 0.29442087, step = 9300 (0.768 sec)
INFO:tensorflow:global_step/sec: 130.544
INFO:tensorflow:loss = 0.31136292, step = 9400 (0.766 sec)
INFO:tensorflow:global_step/sec: 130.849
INFO:tensorflow:loss = 0.34016177, step = 9500 (0.764 sec)
INFO:tensorflow:global_step/sec: 129.551
INFO:tensorflow:loss = 0.39522016, step = 9600 (0.772 sec)
INFO:tensorflow:global_step/sec: 130.262
INFO:tensorflow:loss = 0.34697112, step = 9700 (0.768 sec)
INFO:tensorflow:global_step/sec: 129.093
INFO:tensorflow:loss = 0.38748953, step = 9800 (0.775 sec)
INFO:tensorflow:global_step/sec: 131.427
INFO:tensorflow:loss = 0.29107836, step = 9900 (0.761 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 9990...
INFO:tensorflow:Saving checkpoints for 9990 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 9990...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000...
INFO:tensorflow:Saving checkpoints for 10000 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2024-05-08T09:31:40
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 30.60751s
INFO:tensorflow:Finished evaluation at 2024-05-08-09:32:11
INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.78671, accuracy_baseline = 0.771205, auc = 0.93178755, auc_precision_recall = 0.69907445, average_loss = 0.34567952, global_step = 10000, label/mean = 0.228795, loss = 0.34567845, precision = 0.7014421, prediction/mean = 0.23119248, recall = 0.117987715
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Performing the final export in the end of training.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:81: build_parsing_serving_input_receiver_fn (from tensorflow_estimator.python.estimator.export.export) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/export/export.py:312: ServingInputReceiver.__new__ (from tensorflow_estimator.python.estimator.export.export) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:786: ClassificationOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/binary_class_head.py:561: RegressionOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/binary_class_head.py:563: PredictOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:168: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://tensorflowcn.cn/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/model_utils/export_utils.py:83: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://tensorflowcn.cn/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1715160731/assets
INFO:tensorflow:SavedModel written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1715160731/saved_model.pb
INFO:tensorflow:Loss for final step: 0.33868045.
INFO:absl:Training complete. Model written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving. ModelRun written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6
INFO:absl:Exporting eval_savedmodel for TFMA.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/model_utils/export_utils.py:345: _SupervisedOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA/temp-1715160733/assets
INFO:tensorflow:SavedModel written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA/temp-1715160733/saved_model.pb
INFO:absl:Exported eval_savedmodel to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/serving_model_dir/saved_model.pb"
INFO:absl:Serving model copied to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-Serving.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/eval_model_dir/saved_model.pb"
INFO:absl:Eval model copied to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA.
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized

使用 TensorBoard 分析训练

可选地，我们可以将 TensorBoard 连接到 Trainer 以分析模型的训练曲线。

# Get the URI of the output artifact representing the training logs, which is a directory
model_run_dir = trainer.outputs['model_run'].get()[0].uri

%load_ext tensorboard
%tensorboard --logdir {model_run_dir}

Evaluator

The Evaluator 组件计算评估集上的模型性能指标。它使用 TensorFlow 模型分析库。The Evaluator 还可以选择验证新训练的模型是否优于以前的模型。这在您可能每天自动训练和验证模型的生产管道设置中很有用。在此笔记本中，我们只训练了一个模型，因此 Evaluator 将自动将模型标记为“良好”。

Evaluator 将使用来自 ExampleGen 的数据、来自 Trainer 的训练模型以及切片配置作为输入。切片配置允许您根据特征值切片指标（例如，您的模型在上午 8 点开始的出租车行程与晚上 8 点开始的出租车行程的性能如何？）。请参见下面此配置的示例

eval_config = tfma.EvalConfig(
    model_specs=[
        # Using signature 'eval' implies the use of an EvalSavedModel. To use
        # a serving model remove the signature to defaults to 'serving_default'
        # and add a label_key.
        tfma.ModelSpec(signature_name='eval')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount')
            ],
            # To add validation thresholds for metrics saved with the model,
            # add them keyed by metric name to the thresholds map.
            thresholds = {
                'accuracy': tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.5}),
                    # Change threshold will be ignored if there is no
                    # baseline model resolved from MLMD (first run).
                    change_threshold=tfma.GenericChangeThreshold(
                       direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                       absolute={'value': -1e-10}))
            }
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

接下来，我们将此配置提供给 Evaluator 并运行它。

# Use TFMA to compute a evaluation statistics over features of a model and
# validate them against a baseline.

# The model resolver is only required if performing model validation in addition
# to evaluation. In this case we validate against the latest blessed model. If
# no model has been blessed before (as in this case) the evaluator will make our
# candidate the first blessed model.
model_resolver = tfx.dsl.Resolver(
      strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
      model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
      model_blessing=tfx.dsl.Channel(
          type=tfx.types.standard_artifacts.ModelBlessing)).with_id(
              'latest_blessed_model_resolver')
context.run(model_resolver)

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)

INFO:absl:Running driver for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running publisher for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running driver for Evaluator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Evaluator
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_eval_shared_model'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Using /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA as  model.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
INFO:absl:The 'example_splits' parameter is not set, using 'eval' split.
INFO:absl:Evaluating model.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_extractors'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:eval_shared_models have model_types: {'tfma_eval'}
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:163: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA/variables/variables
2024-05-08 09:32:18.153481: W tensorflow/c/c_api.cc:305] Operation '{name:'head/metrics/count_3/Assign' id:1368 op device:{requested: '', assigned: ''} def:{ { {node head/metrics/count_3/Assign} } = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](head/metrics/count_3, head/metrics/count_3/Initializer/zeros)} }' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2024-05-08 09:32:18.323322: W tensorflow/c/c_api.cc:305] Operation '{name:'head/metrics/count_3/Assign' id:1368 op device:{requested: '', assigned: ''} def:{ { {node head/metrics/count_3/Assign} } = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](head/metrics/count_3, head/metrics/count_3/Initializer/zeros)} }' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
INFO:absl:Evaluation complete. Results written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Evaluator/evaluation/8.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:112: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Evaluator/blessing/8.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized

现在让我们检查一下 Evaluator 的输出工件。

evaluator.outputs

{'evaluation': OutputChannel(artifact_type=ModelEvaluation, producer_component_id=Evaluator, output_key=evaluation, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'blessing': OutputChannel(artifact_type=ModelBlessing, producer_component_id=Evaluator, output_key=blessing, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

使用 evaluation 输出，我们可以显示整个评估集上全局指标的默认可视化。

context.show(evaluator.outputs['evaluation'])

SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'Overall', 'metrics':…

要查看切片评估指标的可视化，我们可以直接调用 TensorFlow 模型分析库。

import tensorflow_model_analysis as tfma

# Get the TFMA output result path and load the result.
PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
tfma_result = tfma.load_eval_result(PATH_TO_RESULT)

# Show data sliced along feature column trip_start_hour.
tfma.view.render_slicing_metrics(
    tfma_result, slicing_column='trip_start_hour')

SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'trip_start_hour:19',…

此可视化显示了相同的指标，但在 trip_start_hour 的每个特征值处计算，而不是在整个评估集上计算。

TensorFlow 模型分析支持许多其他可视化，例如公平指标和绘制模型性能的时间序列。要了解更多信息，请参见教程。

由于我们在配置中添加了阈值，因此验证输出也可用。The blessing 工件的存在表明我们的模型已通过验证。由于这是执行的第一次验证，因此候选者会自动获得祝福。

blessing_uri = evaluator.outputs['blessing'].get()[0].uri
!ls -l {blessing_uri}

total 0
-rw-rw-r-- 1 kbuilder kbuilder 0 May  8 09:32 BLESSED

现在还可以通过加载验证结果记录来验证成功

PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
print(tfma.load_validation_result(PATH_TO_RESULT))

validation_ok: true
validation_details {
  slicing_details {
    slicing_spec {
    }
    num_matching_slices: 25
  }
}

Pusher

The Pusher 组件通常位于 TFX 管道的末尾。它检查模型是否已通过验证，如果通过，则将模型导出到 _serving_model_dir。

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
context.run(pusher)

INFO:absl:Running driver for Pusher
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Pusher
INFO:absl:Model version: 1715160747
INFO:absl:Model written to serving path /tmpfs/tmp/tmp15vk_44e/serving_model/taxi_simple/1715160747.
INFO:absl:Model pushed to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Pusher/pushed_model/9.
INFO:absl:Running publisher for Pusher
INFO:absl:MetadataStore with DB connection initialized

让我们检查一下 Pusher 的输出工件。

pusher.outputs

{'pushed_model': OutputChannel(artifact_type=PushedModel, producer_component_id=Pusher, output_key=pushed_model, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

特别是，Pusher 将以 SavedModel 格式导出您的模型，它看起来像这样

push_uri = pusher.outputs['pushed_model'].get()[0].uri
model = tf.saved_model.load(push_uri)

for item in model.signatures.items():
  pp.pprint(item)

INFO:absl:Fingerprint not found. Saved model loading will continue.
INFO:absl:path_and_singleprint metric could not be logged. Saved model loading will continue.
('regression',
 <ConcreteFunction () -> Dict[['outputs', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)]] at 0x7F38C020BDF0>)
('predict',
 <ConcreteFunction () -> Dict[['class_ids', TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)], ['classes', TensorSpec(shape=(None, 1), dtype=tf.string, name=None)], ['probabilities', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)], ['logits', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)], ['all_class_ids', TensorSpec(shape=(None, 2), dtype=tf.int32, name=None)], ['logistic', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)], ['all_classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)]] at 0x7F39100E0A90>)
('serving_default',
 <ConcreteFunction () -> Dict[['scores', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)], ['classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)]] at 0x7F38C0424700>)
('classification',
 <ConcreteFunction () -> Dict[['classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)], ['scores', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)]] at 0x7F38C0620D60>)

我们完成了对内置 TFX 组件的介绍！