Abstract—Networks of computers are
everywhere. The internet is one of the most common example of it likewise
distributed system is a network that consist of autonomous computer that are
connected through a distributed middleware. In this paper four distributed file
system architecture Google File System, Microsoft distributed file network
Andrew File System and Sun Network File System is reviewed on the basis of performance,
Scalability, Data Integrity, Security and heterogeneity for the better understanding
of different file system a comparative study is required.
File System ,Sun Network File System,Andrew File System .
I. Introduction 1
File System is referred to as file management and
sometimes abbreviated ad FS, A file system is a
method and data structure that an
operating system user to keep track of the files on a disk or partition, the
word is also refer to a partition or disk that is used to store the file or the
type of file system. A file is a collection of related information that is
recorded on secondary storage. Or file is a collection of logical related entities.
File system usually consist of files
separated into groups called directories. There are many types of File system
which are commonly used to determine how data is accessed.
Distributed file System or DFS is a file system is a
client/server-based application that allows clients to access and process data
stored on the server as if it were on
their own machine , when a user accessed a file on the server , the server
sends the user a copy of the file, which is cached on the user’s computer while
the data is being processed and then return to the server , a distributed file
system organizes files and directory services of individual servers into a
global directory in such a way that remote data access is not location-specific
but is identical from any client . All the files are requested by the by the user
are located at different system at different places globally whenever any user
request any service/file all the system simultaneously provide
information/service to the Client. Sharing of resources is the main motive of
A DFA operating system runs on multiple independent
computers, connected through communication network, but appears to its user as
a single virtual machine and runs its own os. Each computer node has its own
memory. Internet, Intranet, Mobile and ubiquitous computing are the come
examples of DFS. Fig__ show the Architecture of a distributed file system
Aditya B. Patel, Manashvi Birla, Ushma Nair,”Addressing
Big Data Problem Using Hadoop and Map Reduce”, NIRMA university international
conference on engineering, nuicone, 06-08december, 2012.2
The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
A REVIEW: Distributed File System International Journal of
Computer Networks and Communications Security VOL. 3, NO. 5, MAY 2015, 229–234 Shiva Asadianfam1, Mahboubeh Shamsi2 and
A Distributed file system is a
client/server -based application that allows clients to access and process data
stored on the server as it is on their local node, when user accesses a file on
the server, the server sends the user a copy of the file , which is cached on
the user’s computer while the data is being processed and is then returned to
the server. The Distributed file system are the bedrock of distributed
computing in office/engineering environments.
of Distributed File System6
Features of Distributed file system7
refers to hiding details from a user, there are three types of transparency
servers are used to provide better performance, scalability, and reliability.
The multiplicity of file servers should be transparent to the client of a
distributed file system
remote files should be accessible in the same way. The file system should
automatically locate an accessed file and transport it to the client’s site
The name of the
file should not reveal the location of the file. The name of the file must not
be changed while moving from one node to another.
of multiple copies and their locations should be hidden from the clients where
files are replicated on multiple nodes.
The user is not
bounded to work on a specific node but should have the flexibility to work on
any given machine at different time.
measured as the average amount of time needed to satisfy client requests, which
includes CPU time plus the time for accessing secondary storage along with
network access time. Explicit file placement decisions should not be needed to
increase the performance of a distributed file system.
requests from multiple users who are competing to access the file must be
properly synchronized using some form of concurrency control mechanism. Atomic
transactions can also be provided to users by a file system for data integrity.
of Distributed File system 9
circumstances of happening two or more events at same time, how to handle the
sharing of resources between clients/ Execution of concurrent programs share
resources: ex web pages, files, etc.
distributed system, Computers are connected through network and have their own
clocks. Communication/sharing between programs is only through messages and
their coordination depends on time.
Each component of
a distributed system can fail independently, leaving other system unaffected
is the property of the system that continue operating properly in the event of
the capability of a system, network, or process to handle a growing amount of work,
or its potential to be enlarged to accommodate that growth.
computing refers to system which use more than one kind of processor or cores. These
systems gain performance or energy efficiency but not just by adding the same
type processors also by adding dissimilar co-processor.
Security is one
of the most important principles, since security need to be pervasive through
the system, security system is normally placed in distributed system.
Google file system is a highly scalable,
distributed file system on expensive commodity hardware that provide fault
tolerance and high aggregate performance and it delivers high aggregate
performance to many clients.
The design has been driven by observation
of our application workloads, and technological environment, both current and anticipated,
that reflect a marked department from some earlier file system assumptions.
This has led to reexamine traditional choices and explore radically different
design points. The file system has successfully met the google storage platform
for the generation and processing of data. The largest cluster of data provides
hundred of terabytes of storage across thousand of disks on over a thousand
machines, and its concurrently accessed by hundreds of clients. GFS is one of
the most successful example of real-time application of distributed system.
With very high percentage of fault tolerance.