A big data analytics platform that provides a wide range of open source applications from big data analytics to machine learning
It provides an integrated analysis environment that allows storage/processing of large-volume data and GPU environment based deep learning analysis.
You can create various big data frameworks with apps, which makes configuring your own analytics environment fast and easy.
Perform batch operations serverless without configuring additional infrastructure.
Data Forest provides a reliable Hadoop cluster environment by applying high availability and security technologies. Spend less time managing and monitoring data without compromising security.
You pay only for the resources you use so you can optimize costs for data analysis.
You can respond flexibly to your business demands by scaling your resources as you need them.
It provides an integrated multitenancy-base platform designed to process large amounts of data and large numbers of users.
Users can perform batch analysis or long-live analysis in multitenancy environments, depending on their analysis purposes.
It provides convenient monitoring environments, including application resource usage and HDFS usage.
You can take necessary measures quickly with alarm and logging.
Data Forest is a secure Hadoop cluster with reinforced security, and provides web access authentication based on access control using Kerberos and LDAP authentications, application permissions management using Apache Ranger, and Apache KNOX.
Apache Hadoop is an open source software for stable and scalable distributed computing. It requires multimaster node configurations to guarantee high availability.Data Forest provides multimaster node configurations and reliable operational services, so you can use reliable environments based on high availability clusters.
It stores the data in Apache Hadoop HDFS (Hadoop Distributed File System) and requests tasks to Apache Hadoop YARN to perform the tasks with the assigned containers.Stored data can be used efficiently in the provided frameworks without moving the data.
It provides various open source components such as Apache Spark, Hive, Presto, HBase, Airflow, Zeppelin, and Jupyter.
It provides TensorFlow, PyTorch libraries for deep learning.(Provided frameworks may change depending on internal situations.)
Data Forest is a cost-effective service that users pay for just the resources they used. The infrastructure and software pricing is combined and charged per minute at a minimum rate of 1 minute.
The plans are divided into Common, Public Queue, and Private Queue.
Common: Charges by the size of data saved with HDFS without setting resources and infrastructure and software used for carrying out jobsHDFS: Charged by the data size saved by HDFS (Hadoop Distributed File System) without setting resourcesPublic Queue - Job: Fees for the infrastructure and software used for carrying out jobs using public queuesPrivate Queue - Job: Fees for the software used for carrying out jobs using private queues
Public Queue: Serverless type; gets resources assigned from public queue when creating appsApp: Infrastructure and software fees charged based on usage amount for the user-configured application resources that have been scaled up/down as needed
Private Queue: A computing resource dedicated for the user (private queue) is created, and it gets resources assigned from the private queue when creating appsNode: Server usage fees for creating and running private queuesApp: Software fees charged based on usage amount for the user-configured application resources using the private queue
HDFS: You'll be charged based on pricing per minute in accordance with the size (GB) stored from the time the data is stored in HDFS.
Job: Charged by adding up all the pricing over the execution period, separated by vCPU, memory, and GPU, if you request Hadoop YARN to execute tasks such as Hive, Spark, and MapReduce.
Classification | Details | Charged item | Billing standard | Fee (Infrastructure + Software) |
---|---|---|---|---|
Common | HDFS | GB | Minute | - |
Common | Public Queue - Job | vCPU | Minute | - |
Common | Public Queue - Job | GPU | Minute | - |
Common | Public Queue - Job | Memory(GB) | Minute | - |
Common | (Scheduled) Private Queue - Job | vCPU | Minute | - |
Common | (Scheduled) Private Queue - Job | GPU | Minute | - |
Common | (Scheduled) Private Queue - Job | Memory(GB) | Minute | - |
(VAT Excluded)
App: Charged for each application name by categorizing applications such as HBase, Kafka, OpenTSDB, Presto, Elasticsearch, Kibana, Zeppelin, Grafana, and HUE into vCPU, memory, and GPU according to the time created in the Hadoop YARN container.
Classification | Details | Charged item | Billing standard | Fee (Infrastructure + Software) |
---|---|---|---|---|
Public Queue | App | vCPU | Minute | - |
Public Queue | App | GPU | Minute | - |
Public Queue | App | Memory (GB) | Minute | - |
(VAT Excluded)
Node: Service fee for created servers is charged if user-dedicated computing resources are used.
Classification | Details | Charged item | Billing standard | Infrastructure fee |
---|---|---|---|---|
Private Queue | Node | vCPU 32EA, Memory 128GB,SSD 100GB | Hour(s) | - |
Private Queue | Node | vCPU 32EA, Memory 256GB,SSD 100GB | Hour(s) | - |
Private Queue | Node | V100 1EA, GPU Memory 32GB,vCPU 8EA, Memory 90GB,SSD 100GB | Hour(s) | - |
Private Queue | Node | V100 2EA, GPU Memory 64GB,vCPU 16EA, Memory 180GB,SSD 100GB | Hour(s) | - |
Private Queue | Node | V100 4EA, GPU Memory 128GB,vCPU 32EA, Memory 360GB,SSD 100GB | Hour(s) | - |
(VAT Excluded)
App: Charged for each application name by separating applications such as HBase, Kafka, OpenTSDB, Presto, Elasticsearch, Kibana, Zeppelin, Grafana, and Hue into vCPU, memory, and GPU during the time created in the Hadoop YARN container. (Infrastructure usage fees are included in Node fees.)
Classification | Details | Charged item | Billing standard | Software fee |
---|---|---|---|---|
Private Queue | App | vCPU | Minute | - |
Private Queue | App | GPU | Minute | - |
Private Queue | App | Memory (GB) | Minute | - |
(VAT Excluded)
Outbound traffic usage is billed based on the pricing table.
The pricing table of network usage can be viewed at Pricing > Pricing plan by region > Network.
Data Forest Notebooks are compute instances that run the Jupyter Notebook app.
To use Notebooks, you must have Accounts in Data Forest created.
You are billed for server instances based on the amount of time you use them.
Classification | Details | Charged item | Billing standard | Fee |
---|---|---|---|---|
Notebooks | Server | 4vCPU 16GB 50GB | Minute | - |
Notebooks | Server | 4vCPU 32GB 50GB | Minute | - |
Notebooks | Server | 8vCPU 16GB 50GB | Minute | - |
Notebooks | Server | 8vCPU 32GB 50GB | Minute | - |
Notebooks | Server | 8vCPU 64GB 50GB | Minute | - |
(VAT Excluded)
Outbound traffic usage is billed based on the pricing table.
The pricing table of network usage can be viewed at Pricing > Pricing plan by region > Network.
Data storage stores the Notebooks data.
You can scale up your storage capacity from 100 GB to 2000 GB in 10 GB increments. If you need more storage capacity, you can choose between 4,000 GB and 6,000 GB.
For more information on data storage pricing, go to Pricing > Pricing for each region > (Region) > Block Storage.
100 vCPUs and 300 GB of memory used for 30 minutes every day for 30 days for Spark jobs in Public Queue
Data of 500 GB stored on average for 30 days in HDFS
Classification | Details | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) |
---|---|---|---|---|---|
Common | Spark Job | 100 | 300 | - | 900 |
HDFS | - | - | 500 | 43200 |
Data Forest pricing:
Common - Public Queue - Job - vCPU pricing
(Number of vCPUs * vCPU fees * task execution time (min))
(100 * - * 900) = -
Common - Public Queue - Job - Memory fee
(Memory usage * GB fees * task execution time (min))
(300 * - * 900) = -
Common - HDFS fee
(HDFS usage * GB fees * task execution time (min))
(500 * - * 43200) = -
Total usage fees = -
100 vCPUs and 300 GB of memory used for 1 hour every day for 30 days for Spark jobs in Public Queue
50 vCPUs and 200 GB of memory used for 2 hours every day for 30 days for Hive jobs in Public Queue
Data of 800 GB stored on average for 30 days in HDFS
Classification | Details | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) |
---|---|---|---|---|---|
Common | Spark Job | 100 | 300 | - | 1800 |
Hive Job | 50 | 200 | - | 3600 | |
HDFS | - | - | 800 | 43200 |
Data Forest usage fee:
Common - Public Queue - Job - vCPU fee
Spark job (number of vCPUs * vCPU fees * task execution time (min)) + Hive job (number of vCPUs * vCPU fees * task execution time (min))
(100 * - * 1800) + (50 * - * 3600) = -
Common - Public Queue - Job - Memory fee
Spark job (memory usage * GB fees * task execution time (min)) + Hive job (memory usage * GB fees * task execution time (min))
(300 * - * 1800) + (200 * - * 3600) = -
Common - HDFS fee
(HDFS usage * GB fees * task execution time (min))
(800 * - * 43200) = -
Total usage fees = -
Zeppelin application uses 4 vCPUs and 12 GB of memory with a container for 30 days in Public Queue
Classification | Details | Component | Number of containers | vCPU | Memory(GB) | Usage time (min) |
---|---|---|---|---|---|---|
Public Queue | Zeppelin Application | Zeppelin | 1 | 4 | 12 | 43200 |
Application Master | 1 | 1 | 4 | 43200 |
Data Forest usage fee:
Public Queue - App - vCPU fee
Zeppelin component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of containers * number of vCPUs * vCPU fees * task execution time (min))
(1 * 4 * - * 43200) + (1 * 1 * - * 43200) = -
Public Queue - App - Memory fee
Zeppelin component (number of containers * memory usage * GB fees * task execution time (min)) + Application Master component (number of containers * memory usage * GB fees * task execution time (min))
(1 * 12 * - * 43200) + (1 * 4 * - * 43200) = -
Total usage fees = -
Zookeeper application has two components: zkweb and zkserver. zkweb uses 2 vCPUs and 4 GB of memory with a container, and zkserver uses 2 vCPUs and 4 GB of memory with 3 containers for 30 days in Public Queue
HBase application has components including hbasemaster, regionserver, thrift, thrift2, and rest in Public Queue. Among them, hbasemaster uses 4 vCPUs and 16 GB of memory with 2 containers for 30 days, and regionserver uses 4 vCPUs and 32 GB of memory with 3 containers for 30 days and adds 3 containers to regionserver in the middle to use for 10 days
Data of 500 GB stored on average for 30 days in HDFS
Classification | Details | Component | Number of containers | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) |
---|---|---|---|---|---|---|---|
Public Queue | Zookeeper Application | zkweb | 1 | 2 | 4 | - | 43200 |
zkserver | 3 | 2 | 4 | - | 43200 | ||
Application Master | 1 | 1 | 4 | - | 43200 | ||
Public Queue | HBase Application | hbasemaster | 2 | 4 | 16 | - | 43200 |
regionserver | 3 | 4 | 32 | - | 43200 | ||
regionserver(addition) | 3 | 4 | 32 | - | 14400 | ||
thrift | - | - | - | - | - | ||
thrift2 | - | - | - | - | - | ||
rest | - | - | - | - | - | ||
Application Master | 1 | 1 | 4 | - | 43200 | ||
Common | HDFS | - | - | - | - | 500 | 43200 |
Data Forest usage fee:
Zookeeper application
Public Queue - App - vCPU fee
zkweb component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + zkserver component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of containers * number of vCPUs * vCPU fees * task execution time (min))
(1 * 2 * - * 43200) + (3 * 2 * - * 43200) + (1 * 1 * - * 43200) = -
Public Queue - App - Memory fee
zkweb component (number of containers * memory usage * GB fees * task execution time (min)) + zkserver component (number of containers * memory usage * GB fees * task execution time (min)) + Application Master component (number of containers * memory usage * GB fees * task execution time (min))
(1 * 4 * - * 43200) + (3 * 4 * - * 43200) + (1 * 4 * - * 43200) = -
HBase application
Public Queue - App - vCPU fee
hbasemaster component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + regionserver component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + regionserver (Addition) component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of containers * number of vCPUs * vCPU fees * task execution time (min))
(2 * 4 * - * 43200) + (3 * 4 * - * 43200) + (3 * 4 * - * 14400) + (1 * 1 * - * 43200) = -
Public Queue - App - Memory fee
hbasemaster Component (number of containers * memory usage * GB fees * task execution time (min)) + regionserver component (number of containers * memory usage * GB fees * task execution time (min)) + regionserver (Addition) Component (number of containers * memory usage * GB fees * task execution time (min)) + Application Master component (number of containers * memory usage * GB fees * task execution time (min))
(2 * 16 * - * 43200) + (3 * 32 * - * 43200) + (3 * 32 * - * 14400) + (1 * 4 * - * 43200) = -
Common - HDFS fee
(HDFS usage * GB fee * execution time (min))
(500 * - * 43200) = -
Total usage fees = -
V100 GPU model uses 1 GPU, 8 vCPUs, and 12 GB in memory for 6 hours every day for 30 days in Public Queue
Data of 500 GB stored on average for 30 days in HDFS
Classification | Details | GPU | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) | |
---|---|---|---|---|---|---|---|
Public Queue | AI App | Tensorflow | 1 | 8 | 12 | - | 10800 |
Application Master | - | 1 | 1 | - | 10800 | ||
Common | HDFS | - | - | - | 500 | 43200 |
Data Forest usage fee:
Public Queue - AI app - GPU fee
(Number of GPUs * GPU fees * task execution time (min))
(1 * - * 10800) = -
Public Queue - AI app - vCPU fee
(number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of vCPUs * vCPU fees * task execution time (min))
(8 * - * 10800) + (1 * - * 10800) = -
Public Queue - AI app - Memory fee
(memory usage * GB fees * task execution time (min)) + Application Master component (memory usage * GB fees * task execution time (min))
(12 * - * 10800) + (1 * - * 10800) = -
Common - HDFS fee
(HDFS usage * GB fees * task execution time (min))
(500 * - * 43200) = -
Total usage fees = -
2 32 vCPUs, 128 GB memory, 100 GB SSD specification servers used for 30 days on Private Queue Node
32 vCPUs and 128 GB of memory used for 18 hours every day for 30 days for Spark jobs in private queues
24 vCPUs and 96 GB of memory used for 20 hours every day for 30 days for Hive jobs in private queues
Data of 800 GB stored on average for 30 days in HDFS
Classification | Details | Number of nodes | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) |
---|---|---|---|---|---|---|
Private Queue | Node | 2 | 32 | 128 | - | 43200 |
Common | Spark Job | - | 32 | 128 | - | 32400 |
Hive Job | - | 24 | 96 | - | 36000 | |
HDFS | - | - | - | 800 | 43200 |
Data Forest usage fee:
Private Queue - Node fee
Node fee (number of nodes * fees for 32 vCPUs, 128 GB memory, 100 GB SSD * Node usage time (min))
(2 * - * 720) = -
Private Queue - Job - vCPU fee
Spark job (number of vCPUs * vCPU fees * task execution time (min)) + Hive job (number of vCPUs * vCPU fees * task execution time (min))
(32 * - * 32400) + (24 * - * 36000) = -
Private Queue - Job - Memory fee
Spark job (memory usage * GB fees * task execution time (min)) + Hive job (memory usage * GB fees * task execution time (min))
(128 * - * 32400) + (96 * - * 36000) = -
Common - HDFS fee
(HDFS usage * GB fees * task execution time (min))
(800 * - * 43200) = -
Total usage fees = -
1 32 vCPUs, 128 GB memory, 100 GB SSD specification server used for 30 days in Private Queue Node
Zeppelin application uses 4 vCPUs and 12 GB of memory with 7 containers for 30 days in Private Queue
Classification | Details | Number of nodes, number of containers | vCPU | Memory(GB) | Usage time (min) |
---|---|---|---|---|---|
Private Queue | Node | 1 | 32 | 128 | 43200 |
Zeppelin Application | 7 | 4 | 12 | 43200 | |
Application Master | 1 | 1 | 4 | 43200 |
Data Forest usage fee:
Private Queue - Node fee
Node fee (number of nodes * fees for 32 vCPUs, 128 GB memory, 100 GB SSD * Node usage time (min))
(1 * - * 720) = -
Public Queue - App - vCPU fee
Zeppelin component (number of containers * number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of containers * Number of vCPUs * vCPU fees * task execution time (min))
(7 * 4 * - * 43200) + (1 * 1 * - * 43200) = -
Public Queue - App - Memory fee
Zeppelin component (number of containers * memory usage * GB fees * task execution time (min)) + Application Master component (number of containers * memory usage * GB fees * task execution time (min))
(7 * 12 * - * 43200) + (1 * 4 * - * 43200) = -
Total usage fees = -
1 V100, 32 GB memory, 8 vCPUs, 90 GB memory, 100 GB SSD specification server for Private Queue node used for 30 days
V100 GPU model that uses 1 GPU, 8 vCPUs, and 64 GB in memory for Private Queue used 20 hours every day for 30 days
Data of 500 GB stored on average in HDFS for 30 days※ When creating an AI app, Application Master is created and the fees for 1 vCPU, 1 GB memory is charged.
Classification | Details | Number of nodes | GPU | vCPU | Memory(GB) | HDFS(GB) | Usage time (min) | |
---|---|---|---|---|---|---|---|---|
Private Queue | Node | 1 | 1 | 8 | 90 | - | 43200 | |
AI App | Tensorflow | - | 1 | 8 | 64 | - | 36000 | |
Application Master | - | - | 1 | 1 | - | 36000 | ||
Common | HDFS | - | - | - | - | 500 | 43200 |
Data Forest usage fee:
Private Queue - Node fee
Node fee (number of nodes * fees for 1 V100, 32 GB memory, 8 vCPUs, 90 GB memory, 100 GB SSD * node usage time (min))
(1 * - * 720) = -
Common - Private Queue - AI app - GPU fee
(Number of GPUs * GPU fees * task execution time (min))
(1 * - * 36000) = -
Common - Private Queue - AI app - vCPU fee
(number of vCPUs * vCPU fees * task execution time (min)) + Application Master component (number of vCPUs * vCPU fees * task execution time (min))
(8 * - * 36000) + (1 * - * 36000)= -
Common - Private Queue - AI app - Memory fee
(memory usage * GB fees * task execution time (min)) + Application Master component (memory usage * GB fees * task execution time (min))
(64 * - * 36000) + (1 * - * 36000) = -
Common - HDFS fee
(HDFS usage * GB fees * task execution time (min))
(500 * - * 43200) = -
Total usage fees = -
When creating Data Forest - Notebooks with 4 vCPUs, 16 GB memory, 50 GB disk specifications and using them for 2 days and 8 hours
Classification | Details | Number of nodes | vCPU | Memory(GB) | Usage time (min) |
---|---|---|---|---|---|
Notebooks | Node | 1 | 4 | 16 | 3360 |
Data Forest usage fee:
Notebooks fee:
Node pricing (number of nodes * fee for 4 vCPUs, 16 GB memory, 50 GB HDD * node usage time (minutes))
(1 * - * 3360) = -
Total usage fees = -