Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin numpy version #6953

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Pin numpy version #6953

wants to merge 1 commit into from

Conversation

BLOrange-AMD
Copy link

This PR is to fix incompatible numpy version of pyt_deepspeed_megatron_gpt2 and pyt_train_deepspeed_megatron_gpt2 with ROCm PyTorch release/2.5 branch.

@BLOrange-AMD BLOrange-AMD requested a review from loadams as a code owner January 15, 2025 23:42
@loadams
Copy link
Contributor

loadams commented Jan 16, 2025

Hi @BLOrange-AMD - I recall there being a dependency between the torch version and numpy version for us, could you share more info on the error you are seeing?

@BLOrange-AMD
Copy link
Author

@loadams With ROCm PyTorch 2.5, on pyt_deepspeed_megatron_gpt2 and pyt_train_deepspeed_megatron_gpt2 models, newer numpy-2.2.1 is downloaded and used instead of using cached numpy-1.26.4, which causes "RuntimeError: Could not infer dtype of numpy.int64". So fixed numpy version from DeepSpeed could be a more stable way to solve the issue.

@loadams loadams changed the title Updated numpy version Pin numpy version Jan 16, 2025
@loadams
Copy link
Contributor

loadams commented Jan 16, 2025

@loadams With ROCm PyTorch 2.5, on pyt_deepspeed_megatron_gpt2 and pyt_train_deepspeed_megatron_gpt2 models, newer numpy-2.2.1 is downloaded and used instead of using cached numpy-1.26.4, which causes "RuntimeError: Could not infer dtype of numpy.int64". So fixed numpy version from DeepSpeed could be a more stable way to solve the issue.

@BLOrange-AMD - I see, however the issue is that DeepSpeed doesn't strictly require a lower numpy version. Here is a sample workflow that uses numpy>2.0.0. Is there another way to pin the numpy version? Or do you believe this will be fixed soon in torch 2.6?

Or perhaps torch just needs to be built with numpy support too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants