Most satellites require an Attitude Determination and ControlSystem (ADCS) to exercise a general control over theorientation of the satellites after launch. Attitudedetermination can be done with the help of a variety of on-board sensors like sun sensors, earth sensors, gyroscopes andmagnetometers. Active attitude control requires the use ofactuators used in combination with some form of non-linearcontrol system to stabilize the satellite along the three axes.The actuators can be thrusters, reaction wheels ormagnetorquers. However, for nanosatellite applications theuse of magnetorquers over moment wheels has been favoredkeeping in mind the cost and weight limitations of suchapplications.Over the years, there have been many control algorithms usedin collaboration with other attitude stabilization techniques todictate the orientation of a satellite body. The use of onlymagnetorquers for three- axis stabilization does raise someissues since the system is controllable in only two axes. Thus,the attitude control cannot be achieved using time-invariantcontrol systems. Instead the use of time-varying ordiscontinuous feedback control systems are required toaddress this. Nonlinear low gain Proportional Derivative(PD) and Proportional Integral Derivative (PID) basedcontrollers are widely used as they produce almost globalasymptotic attitude regulation without the need of momentwheels. In this paper, this concept is taken further byreplacing the PD or PID controllers with a reinforcementlearning based system consisting of two neural networkstrained by a feedback loop. The paper introduces theproposed attitude determination and control system.The paper is arranged as per the following format. Section IIdescribes the recent work done on learning based controllersand ADCS models in general. The proposed model for thispaper is introduced in section III, with emphasis on the orbit2propagator. Section IV talks about the ReinforcementLearning system implemented and section V talks about theactual parameters of the controller used in our model. SectionVI concludes the paper with a few results obtained and a briefmention of the future work that can be done on this topic.