Google sorted out the GFS into groups of PCs. A bunch is essentially a
system of PCs. Each group may contains innumerable applications and computer.
The GFS architecture contains 3 different kind of entities they are as listed
Clients, Master servers and chunk servers. There is generally only a
single master, one or numerous customers/clients and numerous chunkserver. Backup
master is also present in case the primary master fails.
Customer – In the
land of GFS the term ” Client” hint to any element that ask for a
file. Demand can extend from administering existing documents to making new
records on the file. Customers can be different PCs or PC application. The
customers are really the clients of the GFS.
Chunk Servers –
ChunkServers are the critical part of GFS architecture, which do most of the
hard work and save the information in chunks . Chunks are 64 mb in size that is
usually a big size for most of the files. The Chunk Server always sends the
information straighfoward to the client and does not involve master in the data
send activity. The master is mostly involved in the control flow activities. It
works like a traffic router.
Master – The master
acts like the planner, all the activities are monitored by the master. The
master commitment is to coordinate the clusters and modify the operation log.
The log is used by the master to note down its action. For troubleshooting, the
logs are very important as it points where the failure and what time the
failure happened. Master stores the changes in the metadata. Master sends
delete and create new chunk request to the chunkserver. We can treat the master
as the store house, it stores the mapping details and the namespace of the
chunks. The operation log is always monitored, if logs are not getting updated,
the system understand that something is wrong an d may be the master is not
alive, in this scenario, quickly an assistant server takes its place so that
the operation does not stop. The master server uses the metadata to identify which
chunkserver has the information the client is looking for and which chunk has
the information. Master does the garbage collection of the abandoned chunk.
For disaster recovery
and information safety, GFS makes several copies of each chunk and keep them in
different chunk server. Each duplicate is known as a replica. Initially the GFS
makes three copy for all chunk, master has the control on the setting to make
extra duplicates of it as and when wanted. The data are not put away in a
similar chunk server, it put away crosswise over various Chunk Server so that
on the off chance that one master server is dead/unresponsive, the other can
answer and give the important data to the customer. This system makes the
procedure continually going.
Figure 1 : GFS
Figure 1 clarifies
the stream of the data from the customer to the master and after that to the
chunkserver. The customer/client goes to the chunk server and ask if it is having the
information. The ace sends chunk index, chunkserver consequently sends a pulse
and tells the master it is alive, When the master thinks about them, it sends
metadata to the customer, on the premise the customer isolates the document
into what he needs, master at that point sends it to the chunkserver.
The application send
the document name, byte range to the GFS client. The customer changes over the
byte counterbalance given by the application into chunk index. The customer at
that point sends the record name and chunk index to the master. The master
check with the chunk server who has the documents. The master thus restores the
chunk handle and area of the copy to the customer. The customer at that point
straightforwardly contact the chunk server with the chunk handle and byte run.
The chunk server at that point exchange the information to the client . The
master isn’t engaged with the information exchange process so master does not
turn into the bottleneck of the procedure.
Write operation: The
application sends the document name and the information to the customer. The
customer sends the document name and chunk index to the master. The master
check with chunk if it has the control,
on the off chance that nobody has the control, master doles out the control to
a chunk, that turns it into a primary chunk server. The master at that point
send the file location and chunk handle to the application. The client sends data to
every one of the copies. The essential in the wake of getting the information,
sends a positive reaction to the customer. The customer sends the write charge
to the primary. The primary sends the data write summon to the two optional
copies in serial request. After the information is written, the auxiliary
answers back to the primary and recognize the write occasion. The primary
affirms to the customer that the compose operation is finished. The
read/compose operations should be possible in parallel by the customer. Figure
Figure 2: Write
information stream operation