Invited Speakers

U. B. Desai
Indian Institute of Technology
Hyderabad, India
Invited Speaker
U. B. Desai
Title: IoT and Cyber Physical System: Smarter Societies
Uday B. Desai received the B. Tech. degree from Indian Institute of Technology, Kanpur, India, in 1974, the M.S. degree from the State University of New York, Buffalo, in 1976, and the Ph.D. degree from The Johns Hopkins University, Baltimore, U.S.A., in 1979, all in Electrical Engineering. Since June 2009 he is the Director of IIT Hyderabad. He is the Mentor Director for IIT Bhilai and IIIT Chittoor.
He has held faculty positions at different universities, Assistant and then Associate Professor at Washington State University and Professor at IIT Bombay. He has held Visiting Associate Professor's position at Arizona State University, Purdue University, and Stanford University. He was a visiting Professor at EPFL, Lausanne. He has been the Director of HP-IITM R and D Lab. at IIT-Madras.
His research interest is in Cyber physical systems, Internet of things, digital fabrication, wireless communication, cognitive radio, wireless sensor networks and statistical signal processing. He has been a coauthor of 9 research monographs and author of nearly 300 peered reviewed papers in international journals and conferences.
He is a member of many central governmental committees and governing council of academic institutions. He was a member of the high powered committee for review of AICTE. He is on the board of Tata Communications Limited.
Dr. Desai is a Fellow of Indian National Science Academy), Fellow of Indian National Academy of Engineering. He is the recipient of J C Bose Fellowship. He is also the recipient of the Excellence in Teaching Award from IIT-Bombay. In 2015 he received the Outstanding Alumni award from University of Buffalo and in 2016 he received the Distinguished Alumni Award from IIT Kanpur.
Visit Homepage

Shalabh Bhatnagar
Indian Institute of Science
Bangalore, India
Invited Speaker
Shalabh Bhatnagar
Shalabh Bhatnagar received a Bachelors in Physics (Hons) from the University of Delhi in 1988. He received his Masters and Ph.D degrees in Electrical Engineering from the Indian Institute of Science, Bangalore in 1992 and 1997, respectively.
He was a Research Associate at the Institute for Systems Research, University of Maryland, College Park, during 1997 to 2000 and a Divisional Postdoctoral Fellow at the Free University, Amsterdam, during 2000 to 2001. He joined the Department of Computer Science and Automation at the Indian Institute of Science, Bangalore in December 2001, where he is now a Professor. He has also held visiting faculty positions at the Indian Institute of Technology, Delhi and the University of Alberta, Canada. Dr. Bhatnagar's interests are in simulation based stochastic optimization, stochastic control and reinforcement learning. He has authored or co-authored more than 120 research articles in various journals and conferences. He is also the coauthor of a book with title `Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods', published by Springer in 2013. He is a Senior Associate of the International Center for Theoretical Physics (ICTP), Italy, a Fellow of the Indian National Academy of Engineering and a Fellow of the Institution of Electronics and Telecommunication Engineers.Visit Homepage
Title: An Incremental Fast Policy Search using a Single Sample Path.
Abstract:
We consider a modified version of the control problem in a reinforcement learning setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which optimizes the long run gamma-discounted transition costs, where gamma lies in [0, 1). They also assume access to a generative model/simulator of the underlying MDP with the hidden premise that realization of the system dynamics of the MDP for arbitrary policies in the form of sample paths can be obtained with ease from the model. We consider a generalized version, where the cost function is the expectation of a non-convex function of the value function without access to the generative model. Rather, we assume that a single sample path generated using a priori chosen behaviour policy is made available. In this information restricted setting, we solve the generalized control problem by developing an incremental version of cross entropy method. The proposed algorithm is shown to converge to the solution which is globally optimal relative to the chosen behaviour policy. We also present a few experimental results to corroborate our claims.