NAVER CLOUD PLATFORM

For Platform 2.0 Only

TensorFlow Cluster

CLI를 사용하여 TensorFlow 분산병렬 처리 환경을 클라우드에서 간편하고 쉽게 구성합니다.

Easy and Convenient Creation of TensorFlow Clusters

Executing a large TensorFlow batch requires a lot of hardware resources and a long processing time due to the amount of computation. Now, use CLI to create an easy environment for a distributed parallel TensorFlow environment in cloud.

Convenient Cluster Creation Based on CLI
After creating a master server (a small VM server for cluster management purposes), you can automatically configure a server node that has installed a library, such as TensorFlow, using the CLI command. Then you can easily add or delete worker server and parameter server nodes.
Deployment and Execution of TensorFlow Codes and Learning Data
You can easily process deployment and execution by managing the NAS volume, such as creation and deletion, and mount TensorFlow learning codes and data simultaneously using the default CLI commands.
Minimized Editing of TensorFlow Codes
Editing of TensorFlow codes is minimized because you can receive the automatically configured cluster information from the user code.
Intel MKL Library (Math Kernel Library)
TensorFlow version with Intel MKL (Math Kernel Library) is applied on Intel Xeon processor for the performance enhancement of deep learning and machine learning. Experience performance that has been improved by more than 200% by various deep learning and machine learning workloads.

Detailed Features

Various CLI commands are provided for the configuration of TensorFlow Clusters.

Various Server Node Operations

You can create server nodes directly in the cluster without accessing the web console using CLI commands from the master server, or stop/restart all server nodes, or stop/restart individual server nodes. You can also easily perform partial deletion, partial addition, or return of server nodes.

Easy Deployment Feature Using Public Storage

NAS storages are created and added using CLI commands from the master server without accessing the web console. Simultaneous mounting/unmounting from all nodes is possible with one command for protection of user codes and learning data and reduction of cluster dependency. It also solves the problem of deploying and sharing the codes and data.500GB is provided by default and you can add a column to process terabytes of learning data.

Job Submission of Cluster

Cluster information is automatically configured in a server node to pass the information from cloud as a parameter or to use as an environment variable when submitting jobs. Therefore, the user is able to use the TensorFlow Cluster environment with minimum modification of the code that was running in the existing single machine. A separate PROCESS KILL operation is not required at rework as it contains a feature to stop the parameter server response automatically after all operations of the job servers have ended.

View Job Logs of Cluster

The jobs performed in clusters are executed in the background and the logs are redirected and integrated in the master server.You can view the server node logs in real-time using the CLI monitor command and view a list of operations performed in the meantime using the CLI history command.

Diverse Server Node Specifications For Your Needs

Choose from the five types of server specifications below provided as default. (GPU server is coming soon)

  • MINI (4 vCPUs, Mem 16GB, HDD 50GB) – A server suitable for cluster testing purposes or handling a small workload.
  • BASIC (8 vCPUs, Mem 32GB, HDD 50GB) – A server suitable for handling a medium workload.
  • HIGH (16 vCPUs, Mem 32GB, HDD 50GB) – A server suitable for handling a large workload.
  • GPU1 (1 GPU, GPU Mem 24GB, 4 vCPUs, Mem 30GB, SSD 50GB) – A server that can expand and use the single GPU to the number of cluster nodes.
  • GPU2 (2 GPUs, GPU Mem 48GB, 8 vCPUs, Mem 60GB, SSD 50GB) – A server that can expand and use the dual GPU to the number of cluster nodes. It is suitable for handling very large workloads.
    (However, all nodes have the identical specifications and are recognized as worker server nodes. You can set the number of parameter server nodes.)

Reference

"Tensorflow Cluster" uses Tensorflow, an open source machine learning software library developed by Google Brain.

Pricing Information

No additional fee is charged on TensorFlow Cluster except for the server fee to configure the cluster.
(However, stop the server nodes to avoid excessive charges in case you are not submitting the cluster job.)

OSVersionUsage Fee (Month)
Ubuntu 16.04
(GPU server nodes will be supported soon)
❖ TensorFlow is not installed in the master node for quick server creation. Users must install it if necessary – See the manual.
Tensorflow 1.3 (Stable Latest)
(The package version distributed may change depending on the version-up speed of TensorFlow.)
❖ Recommended specification for cluster testing purposes or handling little workload: 205,000 KRW a month per server node for 4 vCPUs and Mem 16G.

Was this page helpful?

Please share your opinion and any suggestions for us.
0/5000
Please enter content.
Send Opinion