Easily set up a TensorFlow environment for distributed parallel processing in the cloud using the CLI
You can create server nodes in a cluster simply by running CLI commands on a master server, and stop or restart the nodes all at once or individually without having to access NAVER Cloud Platform's web console. You can add, delete, or return some of the server nodes.
You can create and expand NAS storage simply by running CLI commands on a master server without having to access NAVER Cloud Platform's web console. You can mount or unmount NAS storage to all of the nodes simultaneously with a one-time command, enabling you to protect your codes and training data. At the same time, you can reduce dependency on clusters while solving problems with deploying and sharing codes and data. By default, you can create 500 GB of NAS storage and expand as necessary to accommodate terabytes of training data.
Once a change has been committed to a master server, it is applied automatically to server nodes. This allows you to run TensorFlow Cluster with minimal changes to the codes you were previously running on a single server (VM). After all jobs on the job server are complete, the parameter server stops responding automatically, so that no separate cleanup process is required when restarting the process.
The jobs in a cluster run in the background and logs are redirected to the master server for integration. You can view logs of the server nodes in real time using the CLI monitor command and a list of previously executed tasks using the CLI history command.
Choose from the below five server specifications provided by default (the GPU server will be available soon):
MINI (4 vCPUs, 16 GB memory, 50 GB HDD): suited for testing clusters or handling small-sized workloads
BASIC (8 vCPUs, 32 GB memory, 50 GB HDD): suited for handling medium-sized workloads
HIGH (16 vCPUs, 32 GB memory, 50 GB HDD): suited for handling large-sized workloads
GPU1 (1 GPU, 24 GB GPU memory, 4 vCPUs, 30 GB memory, 50 GB SSD) – Scale a single GPU proportional to the number of cluster nodes
GPU2 (2 GPUs, 48 GB GPU memory, 8 vCPUs, 60 GB memory, 50 GB SSD) – Scale a dual GPU proportional to the number of cluster nodes (suited for handling very large workloads)(Note that node specifications are configured identically and all of the nodes are recognized as worker server nodes. You can specify the number of nodes for parameter servers.)
NAVER Cloud’s Ncloud TensorFlow Cluster uses TensorFlow, an open-source machine learning software library developed by the Google Brain team.
OS | Version | Usage Fee (Month) |
---|---|---|
Ubuntu 16.04 (GPU server nodes will be supported soon) ❖ TensorFlow is not installed in the master node for quick server creation. Users must install it if necessary – See the manual. | Tensorflow 1.3 (Stable Latest) (The package version distributed may change depending on the version-up speed of TensorFlow.) | ❖ Recommended specification for cluster testing purposes or handling little workload: - a month per server node for 4 vCPUs and Mem 16G. |
(VAT Excluded)