Fix env image build failures and liveness probe timeouts in share-models notebook CI#4000
Draft
Copilot wants to merge 5 commits into
Draft
Fix env image build failures and liveness probe timeouts in share-models notebook CI#4000Copilot wants to merge 5 commits into
Copilot wants to merge 5 commits into
Conversation
Copilot
AI
changed the title
[WIP] Fix failing GitHub Actions job build
Harden share-models notebook against transient online deployment probe failures in CI
Jun 5, 2026
- Update env_train/Dockerfile: Python 3.8.13→3.10, remove EOL packages, relax version constraints that blocked builds - Update artifacts/model/conda.yaml: drop old pip<=22.0.4 constraint, azureml-ai-monitoring/contrib packages, update to Python 3.10 - Notebook: swap HttpResponseError try/except retry for ProbeSettings (initial_delay=300s) which fixes the root cause of liveness probe timeouts instead of masking them
Copilot
AI
changed the title
Harden share-models notebook against transient online deployment probe failures in CI
Fix env image build failures and liveness probe timeouts in share-models notebook CI
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CI was failing in two distinct ways: the
SKLearnEnvDocker build was breaking due to EOL Python 3.8 and overly restrictive legacy package pins, and the managed online deployment was hitting liveness probe 502s before the MLflow serving container finished loading.env_train/Dockerfilepython:3.8.13→python:3.10(3.8 EOL Oct 2024)matplotlib,psutil,tqdm,ipykernel)pandas>=1.3,<3.0,scipy>=1.7,<2.0,numpy>=1.21,<2.0)artifacts/model/conda.yamlpip<=22.0.4(breaks modern dependency resolution) and deprecatedazureml-ai-monitoring/azureml-contrib-servicespython=3.10.*,mlflow>=2.0, relaxed remaining pinsNotebook deployment cell
try/except HttpResponseErrorretry loop withProbeSettings(initial_delay=300)— the correct fix for a slow-starting MLflow container rather than catching the failure after it's declared dead: