👨🔬
Advanced Usage: Faster processing
For larger datasets, you can speed up Galileo data processing here
Cuda's CUML libraries require CUDA 11.X to work properly. You can check your CUDA version by running
nvcc -V
. Do not run nvidia-smi, that does not give you the true CUDA version. To learn more about this installation or to do it manually, see the installation guide.If you are training on datasets in the millions, and noticing that the Galileo processing is slowing down at the "Dimensionality Reduction" stage, you can optionally run those steps on the GPU/TPU that you are training your model with.
In order to leverage this feature, simply install
dataquality
with the [cuda]
extra.pip install 'dataquality[cuda]' --extra-index-url=https://pypi.ngc.nvidia.com/
We pass in the
extra-index-url
to the install, because the extra required packages are hosted by Nvidia, and exist on Nvidia's personal pypi repository, not the standard pypi repository.After running that installation, dataquality will automatically pick up on the available libraries, and leverage your GPU/TPU to apply the dimensionality reduction.
Please validate that the installation ran correctly by running
import cuml
in your environment. This must complete successfully.We install the nvidia libraries via pip. This is experimental, and there are a number of other ways to install these libraries. If you'd like to install them without dataquality, you can do so by following Nvidia's official documentation.
Dataquality specifically needs the
cuml-cu11
library, but it is dependent on the followingucx-py-cu11
rmm-cu11
raft-dask-cu11
pylibraft-cu11
dask-cudf-cu11
cudf-cu11
[As us 03-13-2023] We specifically install all libraries pinned at version
22.12
. This is because after 22.12
, Nvidia changed the location where they host their packaged from https://pypi.ngc.nvidia.com/ to https://pypi.nvidia.com/. This caused a number of issues, and the current suggestion from the community is to maintain pinned at 22.12
until the issue is resolved. For more information, see this issue.