to_static¶
将带有分布式切分信息的动态图 layer
转换为静态图分布式模型, 可在静态图模式下进行分布式训练;同时将动态图下所使用的数据迭代器 loader
转换为静态图分布式训练所使用的数据迭代器。
paddle.distributed.to_static
返回 DistModel
实例和 DistributedDataLoader
实例。 DistModel
实例包含了转换后的静态图模型,同时提供了训练、评估和预测的接口。 DistributedDataLoader
实例用于在静态图分布式训练中加载数据。
参数¶
layer (paddle.nn.Layer) - 带有分布式信息,可在动态图模式下进行分布式训练的模型。
loader (paddle.io.DataLoader) - 动态图训练时所使用的数据迭代器。
loss (Loss|Callable|None, 可选) - 损失函数。需要训练或者评估模型时,该参数必须设定。
optimizer (Optimizer|None, 可选) - 优化器。训练模型时,该参数必须设定。
strategy (Strategy|None, 可选) - 分布式训练的配置,用于设置混合精度训练、分布式优化策略等。
返回¶
DistModel: 用于静态图分布式训练的模型,通过 __call__
方法进行训练、评估和预测。需要执行训练、评估或预测时,需要先使用 DistModel
实例的 train()/eval()/predict()
方法将其转换为对应的模式。 DistModel
实例的默认模式会根据 paddle.distributed.to_static
的输入设置,当 loss
和 optimizer
均给定时,默认模式为 train
;当 optimizer
为空时,默认模式为 eval
;当 loss
和 optimizer
均为空时,默认模式为 predict
。
DistributedDataLoader: 用于静态图分布式训练的数据迭代器,和 paddle.io.DataLoader
用法一致。
代码示例¶
>>> import numpy as np
>>> import paddle
>>> import paddle.distributed as dist
>>> from paddle import nn
>>> from paddle.distributed import Replicate, Shard
>>> BATCH_SIZE = 4
>>> BATCH_NUM = 4
>>> IMAGE_SIZE = 16
>>> CLASS_NUM = 8
>>> class RandomDataset(paddle.io.Dataset):
... def __init__(self, images, labels, num_samples):
... self.images = images
... self.labels = labels
... self.num_samples = num_samples
... def __getitem__(self, idx):
... return self.images[idx], self.labels[idx]
... def __len__(self):
... return self.num_samples
>>> class DemoNet(nn.Layer):
... def __init__(self, mesh):
... super().__init__()
... self._mesh = mesh
... self.linear_0 = nn.Linear(IMAGE_SIZE, IMAGE_SIZE)
... self.linear_1 = nn.Linear(IMAGE_SIZE, CLASS_NUM)
... self.relu = nn.ReLU()
... # shard the weights of this layer
... self.linear_0.weight = dist.shard_tensor(
... self.linear_0.weight,
... self._mesh,
... [Shard(1)],
... stop_gradient=False,
... )
... self.linear_1.weight = dist.shard_tensor(
... self.linear_1.weight,
... self._mesh,
... [Shard(0)],
... stop_gradient=False,
... )
... def forward(self, x):
... out = self.linear_0(x)
... out = self.relu(out)
... out = self.linear_1(out)
... return out
>>> images = np.random.rand(BATCH_SIZE, IMAGE_SIZE).astype('float32')
>>> labels = np.random.rand(BATCH_SIZE, CLASS_NUM).astype('float32')
>>> dataset = RandomDataset(images, labels, BATCH_SIZE)
>>> loader = paddle.io.DataLoader(dataset, batch_size=BATCH_SIZE)
>>> mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
>>> layer = DemoNet(mesh)
>>> opt = paddle.optimizer.SGD(
... learning_rate=0.1, parameters=layer.parameters()
... )
>>> loss_fn = nn.MSELoss()
>>> dist_loader = dist.shard_dataloader(loader, meshes=[mesh])
>>> dist_model = dist.to_static(
... layer, dist_loader, loss_fn, opt
... )
>>> # training
>>> dist_model.train()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
... # in train mode, executing the __call__ method will
... # update the parameters of the model and return the
... # loss
... loss = dist_model(image, label)
>>> # evaluation
>>> dist_model.eval()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
... # in eval mode, executing the __call__ method will
... # return the loss
... loss = dist_model(image, label)
>>> # prediction
>>> dist_model.predict()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
... # in predict mode, executing the __call__ method will
... # return a dict that contains the outputs of the model,
... # where the value of "out0" is the first output.
... outs = dist_model(image)
>>> # This case need to be executed in multi-card environment
>>> # export CUDA_VISIBLE_DEVICES=0,1
>>> # python -m paddle.distributed.launch {test_case}.py