Bitcoin Support Your contribution funds and supports Keyrox's website

LinkedIn open-sources a tool to run TensorFlow on Hadoop

LinkedIn has open-sourced a project for scaling and managing deep learning jobs in TensorFlow, using the YARN (Yet Another Resource Negotiator) job scheduling system in Hadoop.

The Tony project came about after LinkedIn tried to use two existing open source solutions for running scheduled TensorFlow jobs on Hadoop and found them both wanting. A few projects to run TensorFlow on Hadoop already exist, but LinkedIn was unsatisfied with them. One, TensorFlow on Spark, runs TensorFlow via Apache Spark’s job engine, but it couples too tightly with Spark. Another, TensorFlowOnYARN, provided the same basic functionality as Tony, but is unmaintained and didn’t provide fault tolerance.

Deep learning models in TensorFlow need some form of job management. Training models can take hours or days, and the training process needs some guarantee it can complete correctly.

Tony uses YARN’s resource and task scheduling system to set up TensorFlow jobs across a Hadoop cluster, according to LinkedIn’s press notes. Tony can also schedule GPU-based TensorFlow jobs through Hadoop, request different kinds of resources (GPUs vs. CPUs), or allocate memory differently for TensorFlow nodes and ensure that job outputs are saved periodically to HDFS and resumed from where they left off if they crash or are interrupted.

Read More…

Leave a Reply

fourteen − one =