You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
but after executing the `fairscale.nn.model_parallel.initialize()` actually got:
> initializing model parallel with size 2
> initializing context parallel with size 2
> initializing pipeline with size 2
> initializing ddp with size 2
data groups: [0, 8]
data groups: [1, 9]
data groups: [2, 10]
data groups: [3, 11]
data groups: [4, 12]
data groups: [5, 13]
data groups: [6, 14]
data groups: [7, 15]
model groups: [0, 1]
model groups: [2, 3]
model groups: [4, 5]
model groups: [6, 7]
model groups: [8, 9]
model groups: [10, 11]
model groups: [12, 13]
model groups: [14, 15]
pipeline groups: [0, 4]
pipeline groups: [1, 5]
pipeline groups: [2, 6]
pipeline groups: [3, 7]
pipeline groups: [8, 12]
pipeline groups: [9, 13]
pipeline groups: [10, 14]
pipeline groups: [11, 15]
context groups: [0, 2]
context groups: [1, 3]
context groups: [4, 6]
context groups: [5, 7]
context groups: [8, 10]
context groups: [9, 11]
context groups: [12, 14]
context groups: [13, 15]
I found that:
groups = torch.LongTensor(range(world_size)).reshape(data_parallel_size, pipeline_length, context_parallel_size, model_parallel_size)
data_parallel_size, pipeline_length, does have an incorrect order?
The text was updated successfully, but these errors were encountered:
Youngluc
changed the title
Hi, Groups division may be incorrect in initialize()
Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py
Aug 30, 2024
The annotation give a correct sample:
GPUs=16, DP=PP=TP=CP=2
I found that:
data_parallel_size, pipeline_length, does have an incorrect order?
The text was updated successfully, but these errors were encountered: