Noam Chomsky in 1959 made a pioneering contribution to the mathematical study of phrase structure rewriting systems that established a hierarchy of grammars (now known as Chomsky Hierarchy, CH), based on their generative capacities. In this hierarchy, both the finite state grammars (FSG)and the context free grammars (CFG) have been extensively studied in the context of the weak and strong generative capacities as they relate to natural languages(NL). It has been clear for a long time that these grammars were inadequate in the context of NL. More powerful systems(but constrained in the 'right way')were required for the description of NL structures. Grammars in CH beyond CFG are too powerful, both in terms of weak and strong generative capacities. Joshi (1985) introduced the notion of Mildly Context Sensitive Languages (MCSL) and a generative grammar system called Tree Adjoining Grammar(TAG, or LTAG for the lexicalized version) which has both the weak and strong generative capacity appropriate for modeling NL. Later, Vijay-Shanker, Weir, and Joshi (1987) established the weak equivalence among LTAG, CCG (Combinatory Categorial Grammars), Head Grammars (HG), and Linear Indexed Grammars (LIG).

Both LTAG and its weakly equivalent TL-MCTAG and CCG are being actively pursued in Computational Linguistics(CL), both formally as well as empirically (statistically). LTAG and TL-LTAG are also being pursued for their adequacy in another sense of 'empirically', in an 'astronomical sense', i.e., if a structure exists in an annotated treebank then it has to be accounted for.

I will attempt to present these ideas to an audience much wider than the usual computational linguistic audience.

For more than two decades, temporal difference Reinforcement Learning (RL) has received a lot of attention as a theoretically grounded approach to learning behavior policies in sequential decision making tasks from online experience. Based on the theory of Markov Decision Processes and dynamic programming, important properties have been proven regarding its convergence to globally optimal policies under a variety of assumptions. However, when scaling up RL to large continuous domains with imperfect representations and hierarchical structure, we often try applying algorithms that are proven to converge in small finite domains, and then just hope for the best.

This talk will advocate instead designing algorithms that adhere to the constraints, and indeed take advantage of the opportunities, that might come with the problem at hand. Drawing on several different research threads within the Learning Agents Research Group at UT Austin, I will discuss four types of issues that arise from these contraints and opportunities: 1) Representation - choosing the algorithm for the problem's representation and adapating the representation to fit the algorithm; 2) Interaction - with other agents and with human trainers; 3) Synthesis - of different algorithms for the same problem and of different concepts in the same algorithm; and 4) Mortality - the opportunity to improve learning based on past experience and the constraint that one can't explore exhaustively.

Learning with expert advice, multi-armed bandits, dynamic pricing, or the dark pool problem can be viewed as specific instances of partial monitoring games where in each round the learner selects an action, the adversary selects an outcome secretly and the learner observes a value that is a function of both the action and outcome. The difficulty of a partial monitoring game is defined as the minimax growth rate of the learner’s regret as the number of rounds increases. It is well known that some of these games are harder, while some others are easier, depending on the "information-cost structure" of the game. How should this structure be taken into account when designing a strategy to minimize the regret. What is the spectrum of difficulty of these games? In this talk we will answer these questions, giving an essentially complete characterization of the difficulty of learning under partial monitoring; proving that from the point of view of the growth rate of regret, all finite partial monitoring games fall into one of four categories. We also give efficient algorithms that achieve the optimal rates in each case up to logarithmic factors. We will also discuss extensions to partial monitoring with side information and some remaining open problems.

This is joint work with Gabor Bartok (ETH Zurich), David Pal (Google) and Andras Antos (MTA SZTAKI).

Topic revision: r24 - 2014-01-05 - 17:42:19 - Main.ddioch2

Copyright 2016 The Board of Trustees of the University of Illinois.webmaster@cs.uic.edu |
WISEST Helping Women Faculty Advance Funded by NSF |