<top>
<paper_num> 1 </paper_num>
<paper_title> Probabilistic logic learning. </paper_title>
<doi> 10.1.1.10.2504 </doi>
<year> 2003 </year>
<paper_abstract> The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the intersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of di#erent formalisms and learning techniques have been developed. This paper provides an introductory survey and overview of the stateof -the-art in probabilistic logic learning through the identification of a number of important probabilistic, logical and learning concepts. </paper_abstract>
<query_num> 101 </query_num>
<text> One of the central open questions in data mining and artificial intelligence, concerns probabilistic logic learning, i.e. the integration of relational or logical representations, probabilistic reasoning mechanisms with machine learning and data mining principles. In the past few years, this question has received a lot of attention and various different approaches have been developed, cf. [82; 41; 88; 66; 73; 45; 60; 61; 57; 81; 52; 77; 2; 55; 87]. This paper provides an introductory survey and overview of those developments that lie at the intersection of logical (or relational) representations, probabilistic reasoning and learning, cf. Figure 1. While doing so, we identify some of the key concepts and techniques underlying probabilistic logic learning. This, in turn, we hope, should allow the reader to appreciate the differences and commonalities between the various approaches and at the same time provide insight into some of the remaining challenges in probabilistic logic learning. </text>

<query_num> 102 </query_num>
<text> Overviews of inductive logic learning and multi-relational data mining can be found in this volume and in [69; 23]. - Early work on incorporating probabilities into logic programs was done by Clark and McCabe [9] and by Shapiro [97]. Subsequently, researchers such as Nilsson [75], Halpern [43], Ng and Subrahmanian [71], and Poole [82] have studied probabilistic logics from a knowledge representational perspective. The aim of this research is more a probabilistic characterization of logic than suitable representations for learning. </text>

<query_num> 103 </query_num>
<text> 4.1 Probabilistic Logical Models The first class of representations extends Bayesian networks with abilities to define probabilities on first order logical or relational interpretations, cf. [82; 41; 73; 45; 57; 81; 31; 52]. Important predecessors of these logical Bayesian networks discussed below, include the work by Breese, Charniak, Bacchus, Goldman, Poole and Wellmann [40; 6; 4; 7; 39]. These predecessors were largely based on the idea of knowledge based model construction (KBMC). In knowledge based model construction, a knowledge base (sometimes in the form of an annotated logic program) is used to describe sets of probabilistic models. A query then results in constructing a particular model that can be used to answer the query. Many of these approaches can represent Bayesian networks. However, they are not direct upgrades of Bayesian networks in the same way that the work by e.g. Haddawy [41] is. This may also explain why – to the authors’ knowledge – Bayesian network learning techniques have not been applied to these representations. </text>

<query_num> 104 </query_num>
<text> As mentioned before, the first approach to directly upgrade Bayesian networks to relational representations, seems due to Peter Haddawy [41]. In his initial approach, the random variables correspond to ground atoms. Furthermore, the need for combining rules was alleviated by requiring that for each random variable, there was at most one instantiation of a clause having the random variable as the head. In later work, Haddawy now together with Ngo [73], used combination rules to deal with the latter problem, and also, the values of the random variables were now written as an extra argument of the atoms, which necessitates the need to add constraints, cf. above. The language, called probabilistic-logic programs (PLPs), was further extended in the knowledge based model construction tradition by distinguishing between logical and probabilistic conditions in clauses. The former then specified the (logical) conditions under which the probabilistic dependencies held. This increased the expressive power and was certainly useful for specifying probabilistic models by hand, but on the other hand, it also significantly extended and therefore complicated the description language. These complications may also explain why – to the authors’ knowledge – no (structural) learning results are known for this formalism. The framework has been applied within medical domains [42; 72].  </text>

<query_num> 105 </query_num>
<text> 4.3 Intermediate Representations The above introduced frameworks integrate probabilistic models with very expressive relational representations or even logic programs. This however comes at a computational cost. Therefore, some recent approaches such as relational Markov models (RMMs) [2], hidden tree Markov models (HTMMs) [28; 21], and logical hidden Markov models (LOHMMs) [55] have tried to upgrade some well understood probabilistic representations, such as (Hidden) Markov Models [84]. These approaches can also be viewed as downgrading one of the more expressive probabilistic logics presented above. Whereas the resulting approaches are typically less expressive than the above introduced ones, the advantages are that (1) they are closely related to the underlying representations, (2) they are – in principle – more efficient, and (3) that the learning algorithms for the underlying representations can almost directly be applied.  </text>

</top>
<top>
<paper_num> 2 </paper_num>
<paper_title> Mitigating denial of service attacks: A tutorial. </paper_title>
<doi> 10.1.1.100.6792 </doi>
<year> 2005 </year>
<paper_abstract> This tutorial describes what Denial of Service (DoS) attacks are, how they can be carried out in IP networks, and how one can defend against them. Distributed DoS (DDoS) attacks are included here as a subset of DoS attacks. A DoS attack has two phases: a deployment and an attack phase. A DoS program must first be deployed on one or more compromised hosts before an attack is possible. Mitigation of DoS attacks requires thus defense mechanisms for both phases. Completely reliable protection against DoS attacks is, however, not possible. There will always be vulnerable hosts in the Internet, and many attack mechanisms are based on ordinary use of protocols. Defense in depth is thus needed to mitigate the effect of DoS attacks. This paper describes shortly many defense mechanisms proposed in the literature. The goal is not to implement all possible defenses. Instead, one should optimize the trade-off between security costs and acquired benefits in handling the most important risks. Mitigation of DoS attacks is thus closely related to risk management. </paper_abstract>
<query_num> 201 </query_num>
<text> Denial of Service (DoS) attacks have proved to be a serious and permanent threat to users, organizations, and infrastructures of the Internet [26]. The primary goal of these attacks is to prevent access to a particular resource like a web server [8]. A large number of defenses against DoS attacks have been proposed in the literature, but none of them gives reliable protection. There will always be vulnerable hosts in the Internet to be used for DoS purposes. In addition, it is very difficult to reliably recognize and filter only attack traffic without causing any collateral damage to legitimate traffic. This paper describes, how DoS attacks can be carried out and how a victim can mitigate them in ordinary IP networks. Especially wireless ad hoc networks have their additional vulnerabilities, but these kind of wireless networks are not the subject of this paper. </text>

<query_num> 202 </query_num>
<text> There are two major reasons making DoS attacks attractive for attackers. The first reason is that there are effective automatic tools available for attacking any victim [9], i.e., expertise is not necessarily required. The second reason is that it is usually impossible to locate an attacker without extensive human interaction [12,59] or without new features in most routers of the Internet [11]. </text>

<query_num> 203 </query_num>
<text> A DoS attack aims in degrading availability. Denial of Service has been defined as the prevention of authorized access to resources or the delaying of time-critical operations [22]. Examples of these resources are network bandwidth, processing capacity, disk space, memory, and static memory structures [8]. DoS attacks can be classified based on the number of sources included in the attack [26]. In a basic DoS attack the attacker uses a single source host to send attack traffic to a victim. In a DDoS attack an attacker uses multiple source hosts to send attack traffic to one or more victims simultaneously. </text>

<query_num> 204 </query_num>
<text> The term intrusion means unauthorized usage or misuse of a computer system [50]. In the context of this paper an intrusion is thus a successful DoS attack where a victim suffers from a degradation of availability for any service [2]. From an attacker’s perspective an attack is successful only, if at least one objective of an attacker is fulfilled. DoS attacks consist of two major phases [26]. In the deployment phase an attacker installs a DoS tool in one or more vulnerable hosts. In the attack phase an attacker coordinates a flooding or a logic attack against a victim. Both of these phases make use of deficiencies in the design or implementation of applications, protocols, and the Internet architecture [37]. A vulnerability is a flaw in security procedures, software, internal system controls, or implementation of an information system that may affect the integrity, confidentiality, and/or availability of data or services [60]. An exploit is a program, script, or technique that makes it possible to take advantage of a vulnerability in a system [2]. </text>

<query_num> 205 </query_num>
<text> This section is DDoS specific, because it describes how a DDoS network can be built. Originally DDoS attack tools were deployed manually, but now worms are typically used for that. Worms are self-propagating malicious software which often either include directly the DDoS attack capability (e.g., Code Red I [41] and Slapper [3]) or contain the possibility to execute arbitrary code (e.g., Code Red II [41]). Some worms, however, are designed to propagate fast without any malicious payload (e.g., Slammer [40]) or their full functionality is not known (e.g., Nimda [57]). Viruses can also be used for the deployment phase to build a large DDoS network, but they cannot replicate automatically by themselves. Typically social engineering is required to get a human to start a program containing a virus (e.g., an E-mail attachment). A target for a DDoS attack has typically some time to prepare, because viruses are identified and reverse-engineered as soon as they are found in the wild. Infected hosts can also be disinfected as soon as antivirus updates are available. A worm, on the contrary, propagates fast and can cause a sudden attack. This makes a worm a more serious deployment tool for DoS attacks. </text>

</top>
<top>
<paper_num> 3 </paper_num>
<paper_title> Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction </paper_title>
<doi> 10.1.1.102.616 </doi>
<year> 2004 </year>
<paper_abstract> Abstract — Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning ”). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support </paper_abstract>
<query_num> 301 </query_num>
<text> Different statistical criteria are used in systems of terminology extraction, for instance ACABIT [8] uses loglikelihood measure [9] and KEA [27] uses TF x IDF measure. The statistical criteria (value of the measures and the rank of each collocation) used in our approach are: Mutual Information (MI) [5] Mutual Information with cube (MI 3 ) [7] Dice Coefficient (Dice) [24] Loglikelihood (L) [9] Number of occurrences + Loglikelihood (OccL) 1 [18] The choice of an interestingness measure, mostly tackled in the literature through statistical and linguistic criteria [7,17,28] is currently viewed as a decision making problem. </text>

<query_num> 302 </query_num>
<text> ROC curve has no sensitivities to the ratio of positives and negatives examples [15] as opposed to other accuracy measures such as Fscore [4]. One advantage of ROC curves is to naturally accommodate ill-balanced distributions and costsensitive learning [8]. </text>

<query_num> 303 </query_num>
<text> The choice of an interestingness measure, mostly tackled in the literature through statistical and linguistic criteria [7,17,28] is currently viewed as a decision making problem. Another approach based on learning an interestingness measure, is proposed by Vivaldi et al. [26]. They represent collocations from the values of the statistical criteria and use Adaboost [20] to automatically construct a discriminant hypothesis. The presented work follows [26] with two main differences i) measures (Dice, OccL) are added to the description of collocations ; ii) the learning problem is one of preference learning [13] instead of discriminant learning. </text>

</top>
<top>
<paper_num> 4 </paper_num>
<paper_title> A Semi-Supervised Feature Clustering Algorithm with Application to Word Sense Disambiguation. </paper_title>
<doi> 10.1.1.118.6481 </doi>
<year> 2005 </year>
<paper_abstract> In this paper we investigate an application of feature clustering for word sense disambiguation, and propose a semisupervised feature clustering algorithm. Compared with other feature clustering methods (ex. supervised feature clustering), it can infer the distribution of class labels over (unseen) features unavailable in training data (labeled data) by the use of the distribution of class labels over (seen) features available in training data. Thus, it can deal with both seen and unseen features in feature clustering process. Our experimental results show that feature clustering can aggressively reduce the dimensionality of feature space, while still maintaining state of the art sense disambiguation accuracy. Furthermore, when combined with a semi-supervised WSD algorithm, semi-supervised feature clustering outperforms other dimensionality reduction techniques, which indicates that using unlabeled data in learning process helps to improve the performance of feature clustering and sense disambiguation. 1 </paper_abstract>
<query_num> 401 </query_num>
<text> This paper deals with word sense disambiguation (WSD) problem, which is to assign an appropriate sense to an occurrence of a word in a given context. Many corpus based statistical methods have been proposed to solve this problem, including supervised learning algorithms (Leacock et al., 1998; Towel and Voorheest, 1998), weakly supervised learning algorithms (Dagan and Itai, 1994; Li and Li, 2004; Mihalcea, 2004; Niu et al., 2005; Park et al., 2000; Yarowsky, 1995), unsupervised learning algorithms (or word sense discrimination) (Pedersen and Bruce, 1997; Schütze, 1998), and knowledge based algorithms (Lesk, 1986; McCarthy et al., 2004). </text>

<query_num> 402 </query_num>
<text> Feature clustering has been extensively studied for the benefit of text categorization and document clustering. In the context of text categorization, supervised feature clustering algorithms (Baker and McCallum, 1998; Bekkerman et al., 2003; Slonim and Tishby, 2001) usually cluster words into groups based on the distribution of class labels over features, which can compress the feature space much more aggressively while still maintaining state of the art classification accuracy. In the context of document clustering, unsupervised feature clustering algorithms (Dhillon, 2001; Dhillon et al., 2002; Dhillon et al., 2003; El-Yaniv and Souroujon, 2001; Slonim and Tishby, 2000) perform word clustering by the use of word-document co-occurrence matrix, which can improve the performance of document clustering by clustering documents over word clusters. </text>

<query_num> 403 </query_num>
<text> For empirical study of dimensionality reduction techniques on WSD task, we evaluated five dimensionality reduction algorithms on the data in English lexical sample (ELS) task of SENSEVAL-3 (Mihalcea et al., 2004)(including all the 57 English words): supervised feature clustering (SuFC) (Baker and McCallum, 1998; Bekkerman et al., 2003; Slonim and Tishby, 2001), iterative double clustering (IDC) (El-Yaniv and Souroujon, 2001), semi-supervised feature clustering (SemiFC) (our algorithm), supervised feature selection (SuFS) (Forman, 2003), and latent semantic indexing (LSI) (Deerwester et. al., 1990). </text>

<query_num> 404 </query_num>
<text> Supervised feature clustering algorithms (Baker and McCallum, 1998; Bekkerman et al., 2003; Slonim and Tishby, 2001) usually cluster words into groups based on the distribution of class labels over features. Baker and McCallum (1998) apply supervised feature clustering based on distributional clustering for text categorization, which can compress the feature space much more aggressively while still maintaining state of the art classification accuracy. Slonim and Tishby (2001) and Bekkerman et. al. (2003) apply information bottleneck method to find word clusters. They present similar results with the work by Baker and McCallum (1998). Slonim and Tishby (2001) goes further to show that when the training sample is small, word clusters can yield significant improvement in classification accuracy. </text>

<query_num> 405 </query_num>
<text> Unsupervised feature clustering algorithms (Dhillon, 2001; Dhillon et al., 2002; Dhillon et al., 2003; El-Yaniv and Souroujon, 2001; Slonim and Tishby, 2000) perform word clustering by the use of word-document co-occurrence matrix, which do not utilize class labels to guide clustering process. Slonim and Tishby (2000), El-Yaniv and Souroujon (2001) and Dhillon et. al. (2003) show that word clusters can improve the performance of document clustering. </text>

</top>
<top>
<paper_num> 5 </paper_num>
<paper_title> Harmony in Motion. </paper_title>
<doi> 10.1.1.119.9881 </doi>
<year> 2007 </year>
<paper_abstract> Cross-modal analysis offers information beyond that extracted from individual modalities. Consider a camcorder having a single microphone in a cocktail-party: it captures several moving visual objects which emit sounds. A task for audio-visual analysis is to identify the number of independent audio-associated visual objects (AVOs), pinpoint the AVOs ’ spatial locations in the video and isolate each corresponding audio component. Part of these problems were considered by prior studies, which were limited to simple cases, e.g., a single AVO or stationary sounds. We describe an approach that seeks to overcome these challenges. It acknowledges the importance of temporal features that are based on significant changes in each modality. A probabilistic formalism identifies temporal coincidences between these features, yielding cross-modal association and visual localization. This association is of particular benefit in harmonic sounds, as it enables subsequent isolation of each audio source. We demonstrate this in challenging experiments, having multiple, simultaneous highly nonstationary AVOs. 1. Cross-Modal Analysis Cross modal analysis is gaining interest in computer vision. Such analysis seeks associations between sources of input data, which have very different natures. Examples of this include registration of images acquired using sensors of different kinds [16], or association of images to text [13], such as in web pages and multimedia subtitles. It also includes audio-visual analysis [24, 26, 30], which has seen a growing expansion of research directions, including lip-reading [8, 14], tracking [25], and spatial localization [7, 10, 18, 19, 23]. This follows evidence of audiovisual cross-modal processing in biology [12]. This work deals with complex scenarios that are sometimes referred to in the literature as a cocktail party [10, 14, 27]: multiple sources exist simultaneously in all modalities. </paper_abstract>
<query_num> 501 </query_num>
<text> Cross modal analysis is gaining interest in computer vision. Such analysis seeks associations between sources of input data, which have very different natures. Examples of this include registration of images acquired using sensors of different kinds [16], or association of images to text [13], such as in web pages and multimedia subtitles. It also includes audio-visual analysis [24, 26, 30], which has seen a growing expansion of research directions, including lip-reading [8, 14], tracking [25], and spatial localization [7, 10, 18, 19, 23]. This follows evidence of audiovisual cross-modal processing in biology [12]. </text>

<query_num> 502 </query_num>
<text> Some of the prior methods considered only parts of these tasks. Others relied on complex audio-visual hardware, such as an array of microphones that are calibrated mutually and with respect to cameras [24, 25]. This yields an approximate spatial localization of audio sources. A single microphone is simpler to set up, but it cannot, on its own, provide audio spatial localization. Hence, locating audio sources using a camera and a single microphone poses a significant computational challenge. In this context, Refs. [18, 23] spatially localize a single audio-associated visual object (AVO). Ref. [7] localizes multiple AVOs if their sounds are repetitive and non-simultaneous. Neither of these studies attempted audio separation. A pioneering exploration of audio separation [10] used complex optimization of mutual information based on Parzen windows. It can automatically localize an AVO if no other sound is present. Results demonstrated in Ref. [30] were mainly of repetitive sounds, without distractions by unrelated moving objects. </text>

</top>
<top>
<paper_num> 6 </paper_num>
<paper_title> Energy Scaling Laws for Distributed Inference in Random Fusion Networks. </paper_title>
<doi> 10.1.1.145.7871 </doi>
<year> 2009 </year>
<paper_abstract> The energy scaling laws of multihop data fusion networks for distributed inference are considered. The fusion network consists of randomly located sensors distributed i.i.d. according to a general spatial distribution in an expanding region. Among the class of data fusion schemes that enable optimal inference at the fusion center for Markov random field (MRF) hypotheses, the scheme with minimum average energy consumption is bounded below by average energy of fusion along the minimum spanning tree, and above by a suboptimal scheme, referred to as Data Fusion for Markov Random Fields (DFMRF). Scaling laws are derived for the optimal and suboptimal fusion policies. It is shown that the average asymptotic energy of the DFMRF scheme is finite for a class of MRF models. </paper_abstract>
<query_num> 601 </query_num>
<text> The seminal work of Gupta and Kumar [11] on the capacity of wireless networks has stimulated extensive studies covering a broad range of networking problems with different performance metrics. See also [12]. Here, we restrict ourselves to related works on energy consumption and data fusion for statistical inference. </text>

<query_num> 602 </query_num>
<text> Results on scaling laws for energy consumption are limited. In [13], energy scaling laws for multihop wireless networks (without any data fusion) are derived under different routing strategies. The issue of node placement for desirable energy scaling has been considered in [14], [15], where it is argued that uniform node placement, routinely considered in the literature, has poor energy performance. It is interesting to note that, for fusion networks, uniform sensor distribution is in fact optimal among a general class of distributions. See Section IV-B. </text>

<query_num> 603 </query_num>
<text> Energy-efficient data fusion has received a great deal of attention over the past decade. See a few recent surveys in [16], [17]. It has been recognized that sensor observations tend to be correlated, and that correlations should be exploited through data fusion. One line of approach is the use of distributed compression with the aim of routing all the measurements to the fusion center. Examples of such approaches can be found in [18]–[20]. </text>

</top>
<top>
<paper_num> 7 </paper_num>
<paper_title> Controlling Access to XML Documents over XML Native and Relational Databases. </paper_title>
<doi> 10.1.1.147.5690 </doi>
<year> 2009 </year>
<paper_abstract> Abstract. In this paper we investigate the feasibility and efficiency of mapping XML data and access control policies onto relational and native XML databases for storage and querying. We developed a re-annotation algorithm that computes the XPath query which designates the XML nodes to be re-annotated when an update operation occurs. The algorithm uses XPath static analysis and our experimental results show that our re-annotation solution is on the average 7 times faster than annotating the entire document. </paper_abstract>
<query_num> 701 </query_num>
<text> This is the first attempt to compare the use of relational and XML databases to store annotated (with accessibility information) XML documents. Annotationbased enforcement techniques have been considered in [3, 7] for rule-based policies. More sophisticated techniques for storing and querying annotations have been investigated [26, 27]. The related problem of optimizing security checks during query evaluation with respect to an annotated document was investigated in [5]. XML access control over relational databases has been also studied in [23]. Our work is different in that we use annotations (materialized approach), whereas Lee et al. check the accessibility of the document on-the-fly. [20] discusses a “function-based” model that translates policy rules to functions (e.g. Java methods) which are subsequently called to check the policy whenever a part of the document is accessed. Security views [10, 16] address the problem of information leaks in the presence of read-only security policies and queries. Security views contain just the information a user is allowed to read; queries to the view can be translated efficiently to queries on the underlying data, foregoing expensive view materialization and maintenance. However, previous work on annotation-based security policies, such as compressed accessibility maps, does not address the problem of keeping the annotations consistent with the policy when the document or policy changes. These techniques have not yet been used directly to provide access control in an XML database system; it appears that doing so would require modifying the database system internals. </text>

<query_num> 702 </query_num>
<text> To store an XML document in a relational database, we first need to create the relational tables used to store the XML document, and in a second phase, produce a relational representation of the XML document using these tables. We employ ShreX [1, 8] to obtain the relational representation of XML documents. ShreX is a system that handles the translation of XML data into relational tables. This includes relational schema creation, document loading and database querying. It takes as input an XML Schema [4, 9, 24] and produces a mapping to create relational tables so that XML documents which are valid instances of the given XML Schema, can be stored. ShreX is also responsible for translating XPath [6] queries to SQL queries which are then evaluated on the relational representation of the XML document. </text>

</top>
<top>
<paper_num> 8 </paper_num>
<paper_title> Learning explicit and implicit visual manifolds by information projection. </paper_title>
<doi> 10.1.1.151.5051 </doi>
<year> 2010 </year>
<paper_abstract> Natural images have a vast amount of visual patterns distributed in a wide spectrum of subspaces of varying complexities and dimensions. Understanding the characteristics of these subspaces and their compositional structures is of fundamental importance for pattern modeling, learning and recognition. In this paper, we start with small image patches and define two types of atomic subspaces: explicit manifolds of low dimensions for structural primitives and implicit manifolds of high dimensions for stochastic textures. Then we present an information theoretical learning framework that derives common models for these manifolds through information projection, and study a manifold pursuit algorithm that clusters image patches into those atomic subspaces and ranks them according to their information gains. We further show how those atomic subspaces change over an image scaling process and how they are composed to form larger and more complex image patterns. Finally, we integrate the implicit and explicit manifolds to form a primal sketch model as a generic representation in early vision and to generate a hybrid image template representation for object category recognition in high level vision. The study of the mathematical structures in the image space sheds lights on some basic questions in human vision, such as atomic elements in visual perception, the perceptual metrics in various manifolds, and the perceptual transitions over image scales. </paper_abstract>
<query_num> 801 </query_num>
<text> Thirdly, in Section 4, we apply the manifold pursuit algorithm in Section 3 to the space of image patches and present experiments for clustering the implicit and explicit manifolds from the space. Also we show the manifolds in a sequence of scaled images to illustrate the transition of these manifolds. Fourthly, we present two case studies in Section 5 that integrate the implicit and explicit manifolds for image representation. One is the primal sketch model for generic images at the middle level (Guo et al., 2007), and the other is the mixed templates for object categories (Si et al., 2009). The two cases demonstrate that the two atomic manifold can be combined to represent general images. </text>

<query_num> 802 </query_num>
<text> It has long been noticed since the 1960s that image signals do not observe the prominent Gaussian distributions. For example, the distribution of the gradients of image intensity ∇I has higher kurtosis and heavier tails than Gaussian, and often remains invariant when the images are down-scaled. This has inspired much research in the 1990s and early 2000s studying the statistics of natural images (Ruderman, 1994; Zhu and Mumford, 1997; Huang and Mumford, 1999; Lee et al., 2003; Mumford and Gidas, 2001). Here, by natural images, people usually mean photos taken in natural scenes which have a rich set of objects of various sizes in a long range of distance from the camera, e.g. trees in a forest. </text>

<query_num> 803 </query_num>
<text> This intuitively explains the transition between explicit manifolds and implicit manifolds over the scaling process. For natural scenes which contain objects in a continuous scale following certain distributions, for example, the object radius r ∼ 1/r 3 in the scene follows a density p(r) ∝ 1/r 3 (Mumford and Gidas, 2001), the individual object must undergo the above perceptual transitions during the zooming process, but the overall local statistics averaged over the entire image remain invariant . We refer to Wang and Zhu (2008); Wu et al. (2008) for more discussion. </text>

</top>
<top>
<paper_num> 9 </paper_num>
<paper_title> REDUS: finding reducible subspaces in high dimensional data. </paper_title>
<doi> 10.1.1.159.2929 </doi>
<year> 2008 </year>
<paper_abstract> Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are feature selection and dimensionality reduction. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging applications, however, scientists are interested in the local latent patterns held by feature subspaces, which may be invisible via any global transformation. In this paper, we investigate the problem of finding strong linear and nonlinear correlations hidden in feature subspaces of high dimensional data. We formalize this problem as identifying reducible subspaces in the full dimensional space. Intuitively, a reducible subspace is a feature subspace whose intrinsic dimensionality is smaller than the number of features. We present an effective algorithm, REDUS, for finding the reducible subspaces. Two key components of our algorithm are finding the overall reducible subspace, and uncovering the individual reducible subspaces from the overall reducible subspace. A broad experimental evaluation demonstrates the effectiveness of our algorithm. </paper_abstract>
<query_num> 901 </query_num>
<text> The goal of feature selection methods [6, 22, 31, 33] is to find a single representative subset of features that are most relevant for the data mining task at hand, such as classification. </text>

<query_num> 902 </query_num>
<text> Dimensionality reduction [4, 8, 18, 27, 29] is widely used as a key component of many approaches in analyzing high dimensional data. The insight behind dimensionality reduction methods is that a high dimensional dataset may exhibit interesting patterns on a lower dimensional subspace due to correlations among the features. Though very successful in finding the low dimensional structures embedded in a high dimensional space, these methods are usually performed in the full feature space. They aim to model the global latent structure of the data and do not separate the impact of any original features nor identify latent patterns hidden in some feature subspaces. </text>

<query_num> 903 </query_num>
<text> Many methods for modelling correlations can be found in the literature, such as mutual information [10], Pearson correlation [26], and rank correlation [19]. </text>

<query_num> 904 </query_num>
<text> We adopt the concept of intrinsic dimensionality [13] to model the high dimensional correlation. We formalize this problem as finding reducible subspaces in the full dimensional space. Informally, a feature subspace is reducible if its intrinsic dimensionality is smaller than the number of features. Various intrinsic dimensionality estimators have been developed [9, 14, 21]. Our problem formalization does not depend on any particular method for estimating the intrinsic dimensionality. We show two necessary properties that any estimator should satisfy in order to be a generalization of the well-defined concepts in linear case. </text>

<query_num> 905 </query_num>
<text> Feature Selection Feature selection methods [6, 22, 31, 33] try to find a subset of features that are most relevant for certain data mining task, such as classification. In order to find the relevant feature subset, these methods search through various subsets of features and evaluate these subsets according to some criteria. </text>

<query_num> 906 </query_num>
<text> Dimensionality reduction Dimensionality reduction methods can be categorized into linear methods, such as Multi-Dimensional Scaling (MDS) [8] and Principal Component Analysis (PCA)[18], and non-linear methods, such as Local Linear Embedding (LLE) [27], ISOMAP [29], and Laplacian eigenmaps [4]. For high dimensional datasets, if there exist low dimensional subspaces or manifolds embedded in the full dimensional spaces, these methods are successful in identifying these low dimensional embeddings. </text>

<query_num> 907 </query_num>
<text> The goal of correlation clustering methods is to find clusters hidden in projected feature spaces [1, 7, 30]. They can be viewed as combinations of clustering methods and dimensionality reduction methods. Both ORCLUS [1] and 4C [7] can be treated as PCA-lized clustering methods. </text>

</top>
<top>
<paper_num> 10 </paper_num>
<paper_title> Document-Word Co-regularization for Semi-supervised Sentiment Analysis. </paper_title>
<doi> 10.1.1.160.8444 </doi>
<year> 2008 </year>
<paper_abstract> The goal of sentiment prediction is to automatically identify whether a given piece of text expresses positive or negative opinion towards a topic of interest. One can pose sentiment prediction as a standard text categorization problem. However, gathering labeled data turns out to be a bottleneck in the process of building high quality text classifiers. Fortunately, background knowledge is often available in the form of prior information about the sentiment polarity of words in a lexicon. Moreover, in many applications abundant unlabeled data is also available. In this paper, we propose a novel semi-supervised sentiment prediction algorithm that utilizes lexical prior knowledge in conjunction with unlabeled examples. Our method is based on joint sentiment analysis of documents and words based on a bipartite graph representation of the data. We present an empirical study on a diverse collection of sentiment prediction problems which confirms that our semi-supervised lexical models significantly outperform purely supervised and competing semisupervised techniques. </paper_abstract>
<query_num> 1001 </query_num>
<text> Most work in sentiment analysis has focused on identifying positive or negative sentiment in text passages online. These studies can be broadly classified into two categories: knowledge-based approaches and learning-based approaches. Knowledge-based approaches primarily use linguistic models or other forms of background knowledge to classify the sentiment of passages. A large focus of this area is the use and generation of dictionaries capturing the sentiment of words. These methods range from manual approaches of developing domain-dependent lexicons [7] to semi-automated approaches [13, 35, 17], and even an almost fully automated approach [30]. As observed by Ng et al. [19], most semi-automated approaches yield unsatisfactory lexicons, with either high coverage and low precision or vice versa. </text>

<query_num> 1002 </query_num>
<text> More recently, Pang et al. [21] successfully applied a machine learning approach to classifying sentiment for movie reviews. They cast the problem as a text classification task, using a bag-of-words representation of each movie review. They demonstrate that a learning approach performs better than simply counting the positive and negative sentiment terms using a hand-crafted dictionary. </text>

<query_num> 1003 </query_num>
<text> Pang et al. [20] extend their work, by first classifying sentences as subjective versus objective, and then classifying only the subjective sentences based on sentiment polarity. They demonstrate that by focusing only on the subjective sentences in each review they were able to improve the overall sentiment classification accuracy. Such a two-staged approach could also improve our results; however, for this paper we focus only on our advances in the polarity prediction stage. </text>

<query_num> 1004 </query_num>
<text> Durant and Smith [11] also apply a text categorization approach to classification of political alignment in blog posts. Although their data is similar to ours, their task of identifying left versus right political alignment is quite different from our goal of identifying positive and negative sentiment. They focus on improving classification through feature selection, and unlike our work, do not exploit alternative sources of information, such as lexicons and unlabeled data. </text>

</top>
<top>
<paper_num> 11 </paper_num>
<paper_title> Minimum Bayes-Risk Decoding for Statistical Machine Translation. </paper_title>
<doi> 10.1.1.161.6918 </doi>
<year> 2004 </year>
<paper_abstract> We present Minimum Bayes-Risk (MBR) decoding over translation lattices that compactly encode a huge number of translation hypotheses. We describe conditions on the loss function that will enable efficient implementation of MBR decoders on lattices. We introduce an approximation to the BLEU score (Papineni et al., 2001) that satisfies these conditions. The MBR decoding under this approximate BLEU is realized using Weighted Finite State Automata. Our experiments show that the Lattice MBR decoder yields moderate, consistent gains in translation performance over N-best MBR decoding on Arabicto-English, Chinese-to-English and Englishto-Chinese translation tasks. We conduct a range of experiments to understand why Lattice MBR improves upon N-best MBR and study the impact of various parameters on MBR performance. 1 </paper_abstract>
<query_num> 1101 </query_num>
<text> Statistical language processing systems for speech recognition, machine translation or parsing typically employ the Maximum A Posteriori (MAP) decision rule which optimizes the 0-1 loss function. In contrast, these systems are evaluated using metrics based on string-edit distance (Word Error Rate), n-gram overlap (BLEU score (Papineni et al., 2001)), or precision/recall relative to human annotations. Minimum Bayes-Risk (MBR) decoding (Bickel and Doksum, 1977) aims to address this mismatch by selecting the hypothesis that minimizes the expected error in classification. Thus it directly incorporates the loss function into the decision criterion. The approach has been shown to give improvements over the MAP classifier in many areas of natural language processing including automatic speech recognition (Goel and Byrne, 2000), machine translation (Kumar and Byrne, 2004; Zhang and Gildea, 2008), bilingual word alignment (Kumar and Byrne, 2002), and parsing (Goodman, 1996; Titov and Henderson, 2006; Smith and Smith, 2007). </text>

<query_num> 1102 </query_num>
<text> We are interested in performing MBR decoding under a sentence-level BLEU score (Papineni et al., 2001) which behaves like a gain function: it varies between 0 and 1, and a larger value reflects a higher similarity. We will therefore use Equation 1 as the MBR decoder. </text>

<query_num> 1103 </query_num>
<text> In particular, our framework might be useful with translation metrics such as TER (Snover et al., 2006) or METEOR (Lavie and Agarwal, 2007). In contrast to a phrase-based SMT system, a syntax based SMT system (e.g. Zollmann and Venugopal (2006)) can generate a hypergraph that represents a generalized translation lattice with words and hidden tree structures.  </text>

</top>
<top>
<paper_num> 12 </paper_num>
<paper_title> A Comprehensive Survey of Multiagent Reinforcement Learning. </paper_title>
<doi> 10.1.1.163.6030 </doi>
<year> 2008 </year>
<paper_abstract> Abstract—Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents ’ learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim—either explicitly or implicitly—at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided. Index Terms—Distributed control, game theory, multiagent systems, reinforcement learning. I. </paper_abstract>
<query_num> 1201 </query_num>
<text> AMULTIAGENT system [1] can be defined as a group of autonomous, interacting entities sharing a common environment, which they perceive with sensors and upon which they act with actuators [2]. Multiagent systems are finding applications in a wide variety of domains including robotic teams, distributed control, resource management, collaborative decision support systems, data mining, etc. [3], [4]. They may arise as the most natural way of looking at the system, or may provide an alternative perspective on systems that are originally regarded as centralized. For instance, in robotic teams, the control authority is naturally distributed among the robots [4]. In resource management, while resources can be managed by a central authority, identifying each resource with an agent may provide a helpful, distributed perspective on the system [5]. </text>

<query_num> 1202 </query_num>
<text> Evolutionary multiagent learning is a special case of a larger class of techniques originating in optimization theory that explore directly the space of agent behaviors. Other examples in this class include gradient search [24], probabilistic hill climbing [25], and even more general behavior modification heuristics [26]. The contribution of direct policy search to the MARL algorithms is discussed in this paper, but general policy search techniques are not reviewed. This is because, as stated before, we focus on techniques that exploit the structure of the RL problem by learning value functions. </text>

<query_num> 1203 </query_num>
<text> A speedup of MARL can be realized thanks to parallel computation when the agents exploit the decentralized structure of the task. This direction has been investigated in, e.g., [45]–[50]. Experience sharing can help agents with similar tasks to learn faster and better. For instance, agents can exchange information using communication [51], skilled agents may serve as teachers for the learner [52], or the learner may watch and imitate the skilled agents [53]. </text>

<query_num> 1204 </query_num>
<text> Specifying a good MARL goal in the general stochastic game is a difficult challenge, as the agents’ returns are correlated and cannot be maximized independently. Several types of MARL goals have been proposed in the literature, which consider stability of the agent’s learning dynamics [54], adaptation to the changing behavior of the other agents [55], or both [13], [38], [56]–[58]. A detailed analysis of this open problem is given in Section IV. </text>

<query_num> 1205 </query_num>
<text> In [13] and [56], convergence is required for stability, and rationality is added as an adaptation criterion. For an algorithm to be convergent, the authors of [13] and [56] require that the learner converges to a stationary strategy, given that the other agents use an algorithm from a predefined, targeted class of algorithms. Rationality is defined in [13] and [56] as the requirement that the agent converges to a best response when the other agents remain stationary. Though convergence to a Nash equilibrium is not explicitly required, it arises naturally if all the agents in the system are rational and convergent. </text>

<query_num> 1206 </query_num>
<text> The minimax-Q algorithm [38], [39] employs the minimax principle to compute strategies and values for the stage games, and a temporal-difference rule similar to Q-learning to propagate the values across state-action pairs. </text>

</top>
<top>
<paper_num> 13 </paper_num>
<paper_title> SMV: Selective Multi-Versioning STM. </paper_title>
<doi> 10.1.1.170.9679 </doi>
<year> 2011 </year>
<paper_abstract> We present Selective Multi-Versioning (SMV), a new STM that reduces the number of aborts, especially those of long read-only transactions. SMV keeps old object versions as long as they might be useful for some transaction to read. It is able to do so while still allowing reading transactions to be invisible by relying on automatic garbage collection to dispose of obsolete versions. SMV is most suitable for read-dominated workloads, for which it achieves much better performance than previous solutions. It has an up to ×11 throughput improvement over a single-version STM and more than a two-fold improvement over an STM keeping a constant number of versions per object. We show that unlike STMs keeping a constant number of versions, SMV operates successfully even in systems with limited memory. 1. </paper_abstract>
<query_num> 1301 </query_num>
<text> Nevertheless, previously suggested multi-versioned STMs did not fully realize this potential. As we show in this paper, this happened mainly because of an inefficient management of old object versions (see Section 2). Instead of keeping a variable number of versions based on demand, multi-versioned STMs existing today keep a constant number of versions for each object [7, 25, 26]. </text>

<query_num> 1302 </query_num>
<text> In Section 4 we evaluate different aspects of SMV’s performance. We implement SMV in Java 2 and study its behavior using STMBench7 [15] as well as a Java port of Vacation [8]. We compare SMV to a TL2-style algorithm — a single-versioned STM whose basic operation is like that of TL2 [11]; and to a k-versioned variant of the same algorithm, which keeps k previous versions per object, similarly to LSA [25]. </text>

<query_num> 1303 </query_num>
<text> As noted above, most existing STMs are single-versioned. Of these, SMV is most closely related to TL2 [11], from which we borrow the ideas of invisible reads, commit-time locking of updated objects, and a global version clock for consistency checking. In a way, SMV can be seen as a multi-versioned extension of TL2. Among multi-versioned STMs, the closest to SMV is LSA [25]. </text>

</top>
<top>
<paper_num> 14 </paper_num>
<paper_title> Group motion segmentation using a Spatio-Temporal Driving Force Model. </paper_title>
<doi> 10.1.1.190.7544 </doi>
<year> 2010 </year>
<paper_abstract> We consider the ‘group motion segmentation ’ problem and provide a solution for it. The group motion segmentation problem aims at analyzing motion trajectories of multiple objects in video and finding among them the ones involved in a ‘group motion pattern’. This problem is motivated by and serves as the basis for the ‘multi-object activity recognition ’ problem, which is currently an active research topic in event analysis and activity recognition. Specifically, we learn a Spatio-Temporal Driving Force Model to characterize a group motion pattern and design an approach for segmenting the group motion. We illustrate the approach using videos of American football plays, where we identify the offensive players, who follow an offensive motion pattern, from motions of all players in the field. Experiments using GaTech Football Play Dataset validate the effectiveness of the segmentation algorithm. 1. </paper_abstract>
<query_num> 1401 </query_num>
<text> In this work we consider the problem of group motion segmentation, and propose a solution for it. The group motion segmentation problem arises from video surveillance applications and sports video analysis, where a ‘group motion’ involving multiple participants is of interest. Specifically, we have in hand point trajectories from consecutive frames of a video sequence, and aim to group them into two or more clusters. While this may appear to be similar to the traditional motion segmentation problems [5, 7, 13, 16, 26, 33, 35, 36, 32], it is actually different in the following sense. </text>

<query_num> 1402 </query_num>
<text> A recent development in the area of video analysis and activity recognition is the need for analyzing these motion patterns of the participating group, which are also called ‘multi-object’ or ‘group’ activities [17, 31, 12, 14, 21, 11, 8, 37, 23, 19, 24, 28], and various approaches have been proposed to recognize the group motion pattern or detect a change or an anomaly. However, these works assume that all objects are involved in the activity, which is far from realistic scenarios where only a portion of the objects/people contribute to the specific group motion. The group motion segmentation problem explored here attempts to divide the point trajectories into participating group and non-participating group, or into multiple participating groups, each corresponding to a group activity. </text>

<query_num> 1403 </query_num>
<text> Turning back to traditional motion segmentation problems we find the majority of them addressing trajectories of feature points from independent 3-D rigid objects [5, 7, 13, 16, 26, 33] and the problem eventually boils down to subspace clustering. </text>

<query_num> 1404 </query_num>
<text> In this work we employ Lie group theory [27] and in particular establish a statistical model over Lie algebra. Lie group and Lie algebra based approaches play roles in invariant visual modeling and recognition [9, 25], robotics [6], 3-D rigid motion estimation [10, 1, 30], as well as dense flow field modeling [20]. In this work, we discuss a new application to group motion estimation. </text>

</top>
<top>
<paper_num> 15 </paper_num>
<paper_title> Proof-Carrying Code. </paper_title>
<doi> 10.1.1.207.5205 </doi>
<year> 2011 </year>
<paper_abstract> Dependent session types allow us to describe not only properties of the I/O behavior of processes but also of the exchanged data. In this paper we show how to exploit dependent session types to express proof-carrying communication. We further introduce two modal operators into the type theory to provide detailed control about how much information is communicated: one based on traditional proof irrelevance and one integrating digital signatures. </paper_abstract>
<query_num> 1501 </query_num>
<text> Session types [9] provide high-level specifications for the communication behavior of interacting processes along bidirectional channels. Recently, logical foundations for session types have been established via Curry-Howard correspondences with linear logic [4, 10]. </text>

<query_num> 1502 </query_num>
<text> Besides clarifying and unifying concepts in session types, such logical underpinnings provide simple means for generalization. One such extension to dependent session types [3, 17] allows us to express and enforce complex properties of data transmitted during sessions. </text>

<query_num> 1503 </query_num>
<text> In this section we will briefly review dependent session types and the facilities they provide in terms of proof-carrying communication. Dependent session types [3, 17] are a conservative extension of session types [9, 4, 5, 8] that allow us to not only describe the behavior of processes in terms of their input and output behavior but also enable us to describe rich properties of the communicated data themselves. </text>

</top>
<top>
<paper_num> 16 </paper_num>
<paper_title> Information extraction challenges in managing unstructured data. </paper_title>
<doi> 10.1.1.211.5676 </doi>
<year> 2008 </year>
<paper_abstract> Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wikibased user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community. 1. </paper_abstract>
<query_num> 1601 </query_num>
<text> The work described here has been carried out in the context of the Cimple project. Cimple started out trying to build community information management systems: those that manage data for online communities, using extraction, integration, and user interaction [13]. Over time, however, it became clear that such systems can be used to manage unstructured data in many contexts beyond just online communities. Hence, Cimple now seeks to build such a general-purpose unstructured data management system, then apply it to a broad variety of applications, including community information management [13], personal information management [3], besteffort/on-the-fly data integration [17], and dataspaces [14] (see www.cs.wisc.edu/~anhai/projects/cimple for more detail on the Cimple project). </text>

<query_num> 1602 </query_num>
<text> A very common method, for example, is to store text data in files, write the IE program as a script, or in a recently developed declarative language (e.g., xlog [18], AQL of System-T [16], UIMA atresearch.ibm.com/UIMA), then execute this program over these text files, using the file system for all storage. </text>

</top>
<top>
<paper_num> 17 </paper_num>
<paper_title> Incorporating prior knowledge in support vector machines for classification: A review. </paper_title>
<doi> 10.1.1.212.8677 </doi>
<year> 2008 </year>
<paper_abstract> For classification, support vector machines (SVMs) have recently been introduced and quickly became the state of the art. Now, the incorporation of prior knowledge into SVMs is the key element that allows to increase the performance in many applications. This paper gives a review of the current state of research regarding the incorporation of two general types of prior knowledge into SVMs for classification. The particular forms of prior knowledge considered here are presented in two main groups: class-invariance and knowledge on the data. The first one includes invariances to transformations, to permutations and in domains of input space, whereas the second one contains knowledge on unlabeled data, the imbalance of the training set or the quality of the data. The methods are then described and classified in the three categories that have been used in literature: sample methods based on the modification of the training data, kernel methods based on the modification of the kernel and optimization methods based on the modification of the problem formulation. A recent method, developed for support vector regression, considers prior knowledge on arbitrary regions of the input space. It is exposed here when applied to the classification case. A discussion is then conducted to regroup sample and optimization methods under a regularization framework. </paper_abstract>
<query_num> 1701 </query_num>
<text> Many classifiers like support vector machines (SVMs) [63] consider the first case where the patterns can be of two classes. For multi-class applications, there also exists learning machines that can tackle directly multi-class applications such as neural networks [4,33] or even multi-class support vector machines (MSVM) [69,21,8]. </text>

<query_num> 1702 </query_num>
<text> Support vector machines (SVMs) were first introduced as large margin classifiers [63]. For a separable training set, the margin is defined as the minimum distance between the points of the two classes, measured perpendicularly to the separating hyperplane. Maximizing this margin is a way for a learning algorithm to control the capacity and the complexity of the machine, and to select the optimal separating hyperplane amongst all the hyperplanes that separate the two classes of the training set. The control of the capacity allows to bound the generalization error [63] which is the probability of misclassification for new test samples [46]. </text>

<query_num> 1703 </query_num>
<text> One advantage of the SVMs is the form of the learning problems, since many general optimization softwares such as CPLEX, LOQO, Matlab linprog and quadprog are capable of solving the linear and quadratic programs derived from SVMs. Nonetheless, the scale of the problems led to develop specific methods such as chunking [29], decomposition [41] or its extreme case known as the Sequential Minimal Optimization (SMO) algorithm [42], of which modifications have been proposed [30]. Many softwares, usually available on the Internet, have been developed in the last years to speed up the training time or to deal with large data sets such as SVM light [25], libSVM [6,15] and libsvmTL [47], HeroSVM [12,13] or Core Vector Machine (CVM) [61]. </text>

<query_num> 1704 </query_num>
<text> In machine learning, the generalization ability of the obtained model depends on the number of data at hand. The more representative samples we have, the better we learn. Based on this simple fact, the idea of creating new samples to enlarge the training set was first introduced by [43] as virtual samples and in [1,2] as “hints”. In [40], learning on an extended training set by virtual samples was linked to regularization and it thus showed a justification for the method. </text>

<query_num> 1705 </query_num>
<text> Objects can also be coded by local distributions centered at the samples. In this case they are called soft-objects. One has then to define a similarity measure between a sample and such an object. Here, tangent vectors can be used to locally approximate the transformations [45] and lead to the Tangent Vector Kernels (TVK) introduced in [44]. </text>

</top>
<top>
<paper_num> 18 </paper_num>
<paper_title> The Recurrent Temporal Restricted Boltzmann Machine. </paper_title>
<doi> 10.1.1.216.5063 </doi>
<year> 2008 </year>
<paper_abstract> The Temporal Restricted Boltzmann Machine (TRBM) is a probabilistic model for sequences that is able to successfully model (i.e., generate nice-looking samples of) several very high dimensional sequences, such as motion capture data and the pixels of low resolution videos of balls bouncing in a box. The major disadvantage of the TRBM is that exact inference is extremely hard, since even computing a Gibbs update for a single variable of the posterior is exponentially expensive. This difficulty has necessitated the use of a heuristic inference procedure, that nonetheless was accurate enough for successful learning. In this paper we introduce the Recurrent TRBM, which is a very slight modification of the TRBM for which exact inference is very easy and exact gradient learning is almost tractable. We demonstrate that the RTRBM is better than an analogous TRBM at generating motion capture and videos of bouncing balls. 1 </paper_abstract>
<query_num> 1801 </query_num>
<text> As a probabilistic model, the TRBM is a directed graphical model consisting of a sequence of Restricted Boltzmann Machines (RBMs) [3], where the state of one or more previous RBMs determines the biases of the RBM in next timestep. This probabilistic formulation straightforwardly implies a learning procedure where approximate inference is followed by learning. The learning consists of learning a conditional RBM at each timestep, which is easily done with Contrastive Divergence (CD) [3]. Exact inference in TRBMs, on the other hand, is highly non-trivial, since computing even a single Gibbs update requires computing the ratio of two RBM partition functions. The approximate inference procedure used in [13] was heuristic and was not even derived from a variational principle. </text>

<query_num> 1802 </query_num>
<text> Models learned by CD1 are often reasonable generative models of the data [3], but if learning is continued with CD25, the resulting generative models are much better [11]. The RBM also plays a critical role in deep belief networks [4], [5], but we do not use this connection in this paper. </text>

<query_num> 1803 </query_num>
<text> We report the results of experiments comparing an RTRBM to a TRBM. The results in [14, 13] were obtained using TRBMs that had several delay-taps, which means that each hidden unit could directly observe several previous timesteps. To demonstrate that the RTRBM learns to use the hidden units to store information, we did not use delay-taps for the RTRBM nor the TRBM, which causes the results to be worse (but not much) than in [14, 13]. If delay-taps are allowed, then the results of [14, 13] show that there is little benefit from the hidden-to-hidden connections (which are W ′ ), making the comparison between the RTRBM and the TRBM uninteresting. </text>

</top>
<top>
<paper_num> 19 </paper_num>
<paper_title> Compiling Language Definitions: The ASF+SDF Compiler </paper_title>
<doi> 10.1.1.22.5679 </doi>
<year> 2000 </year>
<paper_abstract> Machine Interfaces|The Bottleneck Eect  High-level transformations have to be applied with extreme care, especially if their purpose is to simplify the compiler by reducing the number of dierent constructs Compiling Language Denitions: The ASF+SDF Compiler  11 that have to be handled later on. For instance, by rst transforming conditional rewrite rules to unconditional ones or associative list matching to term matching, the compiler can be simplied considerably, but at the expense of a serious degradation in the performance of the generated code. Similarly, transformation of default rules (which can be applied only when all other rules fail) to sets of ordinary rewrite rules that catch the same cases would lead to very inecient code. These transformations would perhaps be appropriate in a formal semantics of ASF+SDF, but in a compiler they cause a bottleneck whose eect is hard to undo at a later stage.  Since it would require a high-level transformation phase of the above kind, the compiler does not generate code for the Abstract Rewrite Machine (ARM) [Fokkink et al. 1998]. In fact, any xed abstract machine interface is a potential bottleneck in the compilation process. The modularization advantage gained by introducing it may be oset by a serious loss in opportunities for generating ecient code. This happens when, in the words of Franz [1994, Sec. 2], \the code generator eectively needs to reconstruct at considerable expense, information that was more easily accessible in the front-end, but lost in the transition to the intermediate representation."  The factors involved in the use of an abstract machine have a qualitatively dierent character. The abstract machine interface facilitates construction and verication  of the compiler, but possibly at the expens... </paper_abstract>
<query_num> 1901 </query_num>
<text> ASF+SDF [Bergstra et al. 1989; van Deursen et al. 1996] is the metalanguage of the ASF+SDF Meta-Environment [Klint 1993; van den Brand et al. 2001], an interactive environment for the development of languages and language-oriented tools, covering parsing, typechecking, translation, transformation, and execution of programs. </text>

<query_num> 1902 </query_num>
<text> We describe the current ASF+SDF compiler and compare its performance with that of other rewrite system and functional language compilers we were able to run, namely, Clean [Plasmeijer and van Eekelen 1994; Smetsers et al. 1991], Elan [Kirchner and Moreau 2001], Haskell [Peyton Jones et al. 1993; Peyton Jones 1996], Maude [Clavel et al. 1999], Opal [Didrich et al. 1994], and SML [Appel 1992]. </text>

<query_num> 1903 </query_num>
<text> The design of the compiler was infuenced by the experience gained in previous compiler activities within the ASF+SDF project itself [Dik 1989; Fokkink et al. 1998; Hendriks 1991; Kamperman 1996] as well as in various functional language and Prolog compiler projects elsewhere. The surveys [Hartel et al. 1996] on functional language compilation and [van Roy 1993] on Prolog compilation were particularly helpful. </text>

</top>
<top>
<paper_num> 20 </paper_num>
<paper_title> Labeling recursive workflow executions on-the-fly. </paper_title>
<doi> 10.1.1.222.3160 </doi>
<year> 2011 </year>
<paper_abstract> This paper presents a compact labeling scheme for answering reachability queries over workflow executions. In contrast to previous work, our scheme allows nodes (processes and data) in the execution graph to be labeled on-the-fly, i.e., in a dynamic fashion. In this way, reachability queries can be answered as soon as the relevant data is produced. We first show that, in general, for workflows that contain recursion, dynamic labeling of executions requires long (linearsize) labels. Fortunately, most real-life scientific workflows are linear recursive, and for this natural class we show that dynamic, yet compact (logarithmic-size) labeling is possible. Moreover, our scheme labels the executions in linear time, and answers any reachability query in constant time. We also show that linear recursive workflows are, in some sense, the largest class of workflows that allow compact, dynamic labeling schemes. Interestingly, the empirical evaluation, performed over both real and synthetic workflows, shows that our proposed dynamic scheme outperforms the state-of-the-art static scheme for large executions, and creates labels that are shorter by a factor of almost 3. </paper_abstract>
<query_num> 2001 </query_num>
<text> Scientific workflow systems are now becoming“provenance aware” by automatically recording data and module dependency during execution (e.g., Taverna [14], VisTrails [7] and Kepler [5]). By using such information, provenance queries such as “Was data item A (or Module M) used to produce data item B, either directly or indirectly?” are enabled. </text>

<query_num> 2002 </query_num>
<text> Workflow runs can be much larger (1000s of vertices) and structurally more complex than the specification due to repeated execution of sub-workflows, e.g., sequentially (loops), in parallel (forked executions) or through recursion. Much research has been devoted recently to develop compact and efficient labeling schemes for workflow runs [6, 13] and graphs in general [24, 16, 17, 15, 2, 9]. A significant shortcoming, however, of the existing schemes is that they all need to examine the entire graph before labeling is performed. </text>

<query_num> 2003 </query_num>
<text> (General DAGs) In contrast to the above results, compact labeling is impossible for general directed acyclic graphs (DAGs), since a known lower bound on the maximum label length is Ω(n) bits. This triggered several alternative approaches for efficiently answering reachability queries over large DAGs: Chain Decomposition [15], Tree Cover [2] and 2-Hop [9]. Other recent work includes Path-Tree [17] and 3-Hop [16] that combine the previous three approaches, and GRAIL [24] that is based on randomized interval labeling. </text>

</top>
<top>
<paper_num> 21 </paper_num>
<paper_title> Space-efficient construction of Lempel-Ziv compressed text indexes. </paper_title>
<doi> 10.1.1.228.4370 </doi>
<year> 2011 </year>
<paper_abstract> Abstract. A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. This is very important nowadays, since one can accommodate the index of very large texts entirely in main memory, avoiding the slower access to secondary storage. In particular, the LZ-index [G. Navarro, Journal of Discrete Algorithms, 2004] stands out for its good performance at extracting text passages and locating pattern occurrences. Given a text T[1..u] over an alphabet of size σ, the LZ-index requires 4|LZ|(1 + o(1)) bits of space, where |LZ | is the size of the LZ78-compression of T. This can be bounded by |LZ |  = uHk(T) + o(u log σ), where Hk(T) is the k-th order empirical entropy of T, for any k = o(log σ u). The LZ-index is built in O(u log σ) time, yet requiring O(u log u) bits of main memory in the worst case. In practice, the LZ-index occupies 1.0-1.5 times the text size (and replaces the text), but its construction requires around 5 times the text size. This limits its applicability to medium-sized texts. In this paper we present a space-efficient algorithm to construct the LZ-index in O(u(log σ + log log u)) time and requiring 4|LZ|(1 + o(1)) bits of main memory, that is, asymptotically the same space of the final index. We also adapt our algorithm to construct more recent reduced versions of the LZ-index, which occupy from 1 to 3 times |LZ|(1 + o(1)) bits, and show that these can also be built using asymptotically the same space of the final index. Finally, we study an alternative model in which we are given only a limited amount of main memory to carry out </paper_abstract>
<query_num> 2101 </query_num>
<text> Text Compression and Indexing. Despite that there has been some work on space-efficient inverted indexes for natural language texts [71, 58] (able to find whole words and phrases), until one decade ago it was believed that any general index for text searching (such as those that we are considering in this paper) would need much more space. In practice, the smallest index available was the suffix array [46], a compact version of suffix trees [1] requiring ulog u bits 3 to index a text of u symbols. Since the text requires ulog σ bits to be represented, the suffix array is usually much larger than the text (typically 4 times the text size). With the large texts available nowadays (e.g., the Human Genome consists of about 3 × 10 9 base pairs), one solution is to store the indexes on secondary memory. However, this has a significant impact on the running time of an application, as accesses to secondary memory are orders of magnitude slower. </text>

<query_num> 2102 </query_num>
<text> Several attempts to reduce the space of the suffix arrays have been made [41, 26, 65, 18, 25, 42, 19]. They aim at compressed indexing, which takes advantage of the regularities of the text to operate in space proportional to that of the compressed text (i.e., c times the size of the text compressed under some model, for some constant c). </text>

<query_num> 2103 </query_num>
<text> The main families of self-indexes based on suffix arrays [57] are the Compressed Suffix Arrays (CSAs for short) [65, 25] and FM-indexes (for “Full-text index in Minute space”) [18, 42, 19]. The latter compress suffix arrays via the Burrows-Wheeler Transform [10]. The compressibility in both families is usually measured in terms of the k-th order empirical entropy, Hk, which is a lower bound on the performance of statistical compressors based on predicting the next text symbol as a function of the k preceding ones. </text>

<query_num> 2104 </query_num>
<text> A separate track of indexes based on Lempel-Ziv compression [72, 73] was pursued in parallel to the research on compressing suffix arrays. These are generally called LZ-indexes [36, 55, 18, 64, 7]. Except for the first pioneering work [36], all the rest are self-indexes and based on the Lempel-Ziv compression algorithm of 1978 (LZ78) [73]. Their space performance is measured in terms of the output size of Lempel-Ziv compressors, which are based on exploiting the repetitions that arise in the text. This can be upper-bounded by the k-th order empirical entropy of the text, but it can be smaller when the text is repetitive. </text>

<query_num> 2105 </query_num>
<text> We are particularly interested in LZ-indexes, since they have shown to be very effective in practice for extracting text, displaying occurrence contexts, and locating many occurrences, outperforming suffixarray-based self-indexes at these tasks [56, 64, 5, 17]. In theory, only LZ-indexes achieve high-order entropy space together with O(log u) worst-case time per located occurrence. Moreover, in practice many pattern occurrences can be actually found in constant time. In particular, we will be interested in Navarro’s LZindex [55, 56] and its more recent variants [6, 7, 5]. </text>

<query_num> 2106 </query_num>
<text> Compressed Construction of Self-Indexes. Many works on compressed full-text self-indexes do not consider the space-efficient construction of the indexes. Yet, this aspect becomes crucial when implementing the index in practice. For example, the original construction of the CSA [26, 65] and FM-index [18] involves building first the suffix array of the text, using for example the algorithm of Larsson and Sadakane [40] or the one by Manzini and Ferragina [48]. Similarly, Navarro’s LZ-index is constructed over a non-compressed intermediate representation [55]. In both cases one needs in practice about 5 times the text size (in the case of the CSA and the FM-index, by using the deep-shallow algorithm [48]). </text>

<query_num> 2107 </query_num>
<text> The LZ-index can be built in O(ulog σ) time [55]. However, a large amount of storage is needed to construct it [56], mainly because of the pointer representation of the tries used at construction time. </text>

</top>
<top>
<paper_num> 22 </paper_num>
<paper_title> Darwin phones: the evolution of sensing and inference on mobile phones. </paper_title>
<doi> 10.1.1.232.4488 </doi>
<year> 2010 </year>
<paper_abstract> We present Darwin, an enabling technology for mobile phone sensing that combines collaborative sensing and classification techniques to reason about human behavior and context on mobile phones. Darwin advances mobile phone sensing through the deployment of efficient but sophisticated machine learning techniques specifically designed to run directly on sensor-enabled mobile phones (i.e., smartphones). Darwin tackles three key sensing and inference challenges that are barriers to mass-scale adoption of mobile phone sensing applications: (i) the human-burden of training classifiers, (ii) the ability to perform reliably in different environments (e.g., indoor, outdoor) and (iii) the ability to scale to a large number of phones without jeopardizing the “phone experience ” (e.g., usability and battery lifetime). Darwin is a collaborative reasoning framework built on three concepts: classifier/model evolution, model pooling, and collaborative inference. To the best of our knowledge Darwin is the first system that applies distributed machine learning techniques and collaborative inference concepts to mobile phones. We implement the Darwin system on the Nokia N97 and Apple iPhone. While Darwin represents a general framework applicable to a wide variety of emerging mobile sensing applications, we implement a speaker recognition application and an augmented reality application to evaluate the benefits of Darwin. We show experimental results from eight individuals carrying Nokia N97s and demonstrate that Darwin improves the reliability and scalability of the proof-ofconcept speaker recognition application without additional burden to users. </paper_abstract>
<query_num> 2201 </query_num>
<text> The reason we select speaker recognition is not because we intend to design a new speaker recognition algorithm (there is a considerable amount of literature on this topic [44, 43, 26, 18, 51]), but to show how Darwin improves a mobile sensing application inference quality. </text>

<query_num> 2202 </query_num>
<text> The feature vector consists of Mel Frequency Cepstral Coefficients (MFCCs) which are proven to be effective in speech audio processing [17][16][42]. We use coefficients 2 to 20 and skip the first coefficient because it models the DC component of the audio. </text>

<query_num> 2203 </query_num>
<text> Each speaker is modeled with a mixture of 20 gaussians (hence, a 20-component GMM). The reason we use GMM is because GMM algorithms are widely used in the speaker recognition literature [40][39]. An initial speaker model is built by collecting a short training sample – 15 seconds in our current implementation. </text>

<query_num> 2204 </query_num>
<text> The initial training phase is intended to be short so that the application can be used immediately by people without requiring a prolonged initialization phase. Clearly, a short training data set implies a less accurate classification model. For this reason off-the-shelf speaker recognition applications [47][18][23] use large number of samples typically several tens of seconds of speakers’ voices in order to achieve acceptable recognition accuracy. </text>

<query_num> 2205 </query_num>
<text> Sensor node co-operation is studied mainly in the context of static sensor networks where fusion [24, 38, 49] and aggregation [52, 45, 33] techniques are applied. </text>

<query_num> 2206 </query_num>
<text> The benefit of sensor nodes cooperation in the context of object tracking using distributed Kalman Filters is discussed in [36, 37]. In [24] the authors propose distributed energy efficient role assignment and [38] discusses signal processing techniques to reduce the amount of sensor data needed to detect an event, while [49] proposes the adoption of distributed average consensus in order to compensate sensing errors. In the CASA project [2] researchers adopt techniques for collaborative and adaptive sensing of the atmosphere using radar technologies. </text>

<query_num> 2207 </query_num>
<text> Semi-supervised machine learning techniques are investigated for word sense disambiguation [50], to identify subjective nouns [41], or to classify emotional and non emotional dialogues [28]. However, no work studies semi-supervised learning techniques in the context of mobile sensing applications or frameworks. </text>

<query_num> 2208 </query_num>
<text> Audio analysis for speaker identification is a well explored area in the literature [17, 42, 16, 44, 43, 26, 18, 40, 23]. Although we do not propose new speaker recognition techniques, we show how to build a lightweight speaker identification application capable of running on mobile phones. </text>

</top>
<top>
<paper_num> 23 </paper_num>
<paper_title> Security Goals: Packet Trajectories and Strand Spaces. </paper_title>
<doi> 10.1.1.24.1295 </doi>
<year> 2000 </year>
<paper_abstract> This material was presented in a series of lectures at fosad,  a summer school on Foundations of Security Analysis and Design, at the  University of Bologna Center at Bertinoro in September 2000. It has two  main purposes. The first </paper_abstract>
<query_num> 2301 </query_num>
<text> Wool and his colleagues [34] follow [11] in representing sets of packets as lists of disjoint colored rectangles, although in their work the relevantly di#erent sources and destinations are not inferred at the start. Instead, they are constructed by examining real configuration files, and splitting ip addresses into ranges according to the access list lines contained in the configuration files. One might alternatively dispense with lists of rectangles and instead represent the sets more directly, for instance using Binary Decision Diagrams (bdds) [3]. In our next section, we will instead consider how bdds can be used as an auxiliary structure to discover the set of atoms that naturally describe existing configuration files. </text>

<query_num> 2302 </query_num>
<text> Despite their simplicity, cryptographic protocols are frequently wrong. Lowe estimates that about half the protocols published fail to achieve their goals in some respect [28]. Since this comment concerns only published, peer-reviewed protocols, one may imagine that the success rate for proprietary protocols would be lower. However, as a consequence of intense work on this problem, including apparently hundreds of published papers, the quality of newer protocols such as ssh [50] and tls [8] seems much better. </text>

<query_num> 2303 </query_num>
<text> We call a bundle normal if it has no redundancies of type C-S and E-D, by analogy with Prawitz's normal deductions [42]. Many others have noted related properties, including [7, 40, 20]. </text>

</top>
<top>
<paper_num> 24 </paper_num>
<paper_title> Large Scale Distributed Deep Networks. </paper_title>
<doi> 10.1.1.258.5430 </doi>
<year> 2012 </year>
<paper_abstract> Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm. 1 </paper_abstract>
<query_num> 2401 </query_num>
<text> Deep learning and unsupervised feature learning have shown great promise in many practical applications. State-of-the-art performance has been reported in several domains, ranging from speech recognition [1, 2], visual object recognition [3, 4], to text processing [5, 6]. </text>

<query_num> 2402 </query_num>
<text> It has also been observed that increasing the scale of deep learning, with respect to the number of training examples, the number of model parameters, or both, can drastically improve ultimate classification accuracy [3, 4, 7]. These results have led to a surge of interest in scaling up the training and inference algorithms used for these models [8] and in improving applicable optimization procedures [7, 9]. </text>

<query_num> 2403 </query_num>
<text> In recent years commercial and academic machine learning data sets have grown at an unprecedented pace. In response, a great many authors have explored scaling up machine learning algorithms through parallelization and distribution [11, 12, 13, 14, 15, 16, 17]. </text>

<query_num> 2404 </query_num>
<text> In the context of deep learning, most work has focused on training relatively small models on a single machine (e.g., Theano [19]). Suggestions for scaling up deep learning include the use of a farm of GPUs to train a collection of many small models and subsequently averaging their predictions [20], or modifying standard deep networks to make them inherently more parallelizable [21]. </text>

<query_num> 2405 </query_num>
<text> We considered a number of existing large-scale computational tools for application to our problem, MapReduce [23] and GraphLab [24] being notable examples. We concluded that MapReduce, </text>

<query_num> 2406 </query_num>
<text> Stochastic gradient descent (SGD) is perhaps the most commonly used optimization procedure for training deep neural networks [25, 26, 3]. </text>

</top>
<top>
<paper_num> 25 </paper_num>
<paper_title> Back to the roots: a probabilistic framework for query-performance prediction. </paper_title>
<doi> 10.1.1.259.3013 </doi>
<year> 2012 </year>
<paper_abstract> The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses about the characteristics of effective retrieval. We present a novel fundamental probabilistic prediction framework. Using the framework, we derive and explain various previously proposed prediction methods that might seem completely different, but turn out to share the same formal basis. The derivations provide new perspectives on several predictors (e.g., Clarity). The framework is also used to devise new prediction approaches that outperform the state-of-the-art. </paper_abstract>
<query_num> 2501 </query_num>
<text> The query-performance prediction task has attracted a lot of research attention [4]. The goal of this task is to estimate the effectiveness of a search performed in response to a query when there is a lack of relevance judgments. The prediction can be performed before retrieval using the query and corpus-based information [19, 16]. Post-retrieval prediction, on the other hand, also uses information induced from the result list of the most highly ranked documents [4]. </text>

<query_num> 2502 </query_num>
<text> More specifically, various predictors are based on completely different hypotheses with respect to what results in effective retrieval. For example, some post-retrieval prediction methods rely on the premise that a result list that exhibits high clarity with respect to the corpus indicates effective retrieval [10, 1, 11, 5, 18]. Other prediction methods assume that effective retrieval should be manifested in a result list that is robust with respect to query perturbations [37, 42], document perturbations [36, 41], and retrieval method perturbations [2]. Another class of prediction methods is based on various analyses of retrieval scores in the result list [35, 14, 42, 30, 12, 13]. </text>

<query_num> 2503 </query_num>
<text> To address the questions stated here, we present a novel query-performance prediction framework that is based on fundamental probabilistic IR principles. We establish theframework by starting with a basic question that has not been explicitly addressed in previous work on prediction: “What is the probability that this result list is relevant to this query?”. This question is a generalization to the documentlist case of the core question of probabilistic retrieval [33, 22, 27]: “What is the probability that this document is relevant to this query?”. </text>

<query_num> 2504 </query_num>
<text> For example, we derive (explain) the Clarity predictor [10]. This derivation provides a novel perspective about the actual property of the result list that Clarity quantifies. Furthermore, we use the framework to show that some predictors that are based on the result-list robustness notion [37, 36, 41, 42] and on analysis of the retrieval-scores distribution [14], implicitly share the same formal basis for prediction. </text>

<query_num> 2505 </query_num>
<text> A recently proposed framework [21] explains several postretrieval predictors. The idea is that an effective result list is that which is similar to some pseudo effective list and dissimilar to a pseudo ineffective list. Our framework is shown to provide a formal probabilistic basis for this framework. </text>

<query_num> 2506 </query_num>
<text> The query-performance prediction task is stated as estimating the effectiveness of a ranking induced by retrieval method M over a corpus of documents D in response to query q in lack of relevance judgments [4]. </text>

</top>
<top>
<paper_num> 26 </paper_num>
<paper_title> Sense of direction in distributed computing. </paper_title>
<doi> 10.1.1.37.804 </doi>
<year> 2003 </year>
<paper_abstract> Sense of Direction is a property of labeled graphs which has been shown to have a definite  impact on computability and complexity in systems of communicating entities, and whose  applicability ranges from the analysis of graph classes to distributed object systems. The full  consequences of this property are still not known; in fact, the ongoing investigations continue to  bring new (often surprising) results, to establish unsuspected links with other research and/or application  areas, and to pose more questions than they answer. The aim of this paper is to provide  a view of the current status of research, describing some of the relevant results, and providing  pointers to future research directions.  1 Introduction  In its more general formulation, a distributed system is a collection of computational entities communicating by exchanging finite amounts of information, which we shall call messages. The exact nature of the entities (i.e., processors, processes, network nodes, agents,... </paper_abstract>
<query_num> 2601 </query_num>
<text> Subsequent research has focused on improvements of the constant factor of the message complexity and on improvements of the time complexity [49, 62], and adding fault-tolerance to the requirements [48, 50, 52, 61]. In the results for fault-tolerant election, f is the number of faulty processors, k is the number of initiators, and the faults are "fail-stop". </text>

<query_num> 2602 </query_num>
<text> A large amount of research has been devoted to the study of computability in anonymous systems; i.e., the study of what problems can be solved when there are no distinct identifiers associated to the nodes (e.g., [2, 9, 5, 42, 43, 51, 68]). Clearly, which problems can be solved depends on many factors including the structural properties of the system as well as the amount and type of structural knowledge available to the system entities (the nodes). The computational power of sense of direction in anonymous systems has been studied in [28] and shown to be linked to the notion of surrounding, introduced in [29]. </text>

<query_num> 2603 </query_num>
<text> For the election problem, in absence of sense of direction, a solution has been developed which uses O(n log log n) messages [19]; this is an improvement on the O(n log n) bound implied by the universal protocol. In presence of the dimensional sense of direction (traditional for hypercubes), several \Theta(n) election algorithms have been presented [24, 57, 64, 66]. Most of these solutions exploit the implicit region partitioning of the topology and an efficient and implicit scheme to compute and represent shortest paths. These results can be obtained also by using the more recent modular technique of [21]. </text>

<query_num> 2604 </query_num>
<text> As a consequence, we have a characterization of minimal SD in terms of Cayley graphs: a regular graph with symmetric labeling has a minimal SD iff it is a Cayley graph with a Cayley labeling [29]; this result holds also for directed graphs [10]. </text>

</top>
<top>
<paper_num> 27 </paper_num>
<paper_title> Application layer reachability monitoring for IP multicast. </paper_title>
<doi> 10.1.1.4.3237 </doi>
<year> 2005 </year>
<paper_abstract> Monitoring and management have become key requirements for the success of multicast deployment in the Internet. One of the most important monitoring tasks for multicast is to verify the availability of service in the network. This task is usually referred to as reachability monitoring. In this paper, we present an application layer multicast reachability monitoring system called sdr-monitor. Sdrmonitor  has emerged in response to the practical need of verifying service availability and detecting potential problems during the early years of native multicast deployment in the inter-domain. Sdr-monitor leverages an existing application and provides close to real-time reachability monitoring for the multicast infrastructure. Since its initial deployment in 1998, sdr-monitor has been serving the multicast community in detecting and correcting multicast reachability problems in the Internet. In addition, sdr-monitor pioneered a number of additional research projects in multicast monitoring and management. In this paper, we first present the architecture of the sdr-monitor system and its outputs. Then, by using a four-year reachability monitoring data set, we present a long term analysis of the reachability characteristics of the multicast infrastructure. Next, by using additional network layer information, we classify reachability problems. Finally, we evaluate sdr-monitor as a reachability monitoring system and identify a number of ways in which it could be improved. </paper_abstract>
<query_num> 2701 </query_num>
<text> Traffic generated by multimedia-based applications has evolved into a significant portion of Internet traffic[1]. As a result, there is a need to develop better mechanisms to support multimedia data delivery. New network-services, such as multicast delivery[2], quality-of-service[3], [4], and in-the-network processing[5] have all been proposed as potential solutions.  </text>

<query_num> 2702 </query_num>
<text> With the deployment of native multicast in the interdomain, the multicast community realized the need for a mechanism to monitor reachability as well as the quality of the multicast service in the Internet. With this goal in mind, sdr-monitor has been designed as a convenient mechanism to monitor multicast reachability on an interdomain scale. Prior to sdr-monitor, there were no mechanisms for multicast users to automatically learn the reachability of their multicast data at receiver sites. On the other hand, being an application layer reachability monitoring system, sdr-monitor is not necessarily the most effective way of performing reachability monitoring for multicast (see Section VI). From this perspective, it motivated a number of additional research projects for performing related monitoring and management tasks for multicast including MRM[13], RMPMon[14], HPMM[15], Multicast Beacon[16], Mantra[17], and MCPM[18]. </text>

<query_num> 2703 </query_num>
<text> Reachability monitoring in the original multicast network topology (known as MBone[19]) was relatively straightforward. The MBone network topology was a virtual, flat network. Reachability, in most cases, was all or nothing. Cases of only partial connectivity existed but were not typical[20]. As the MBone has evolved into a native network service, and as the multicast topology has become hierarchical, reachability monitoring has become more complicated. The opportunity for reachability problems to exist has increased. In the current hierarchical model, multicast service is realized by running a set of protocols. First, we use a protocol to construct a multicast forwarding tree connecting sources and receivers in a multicast group. Currently, Protocol Independent Multicast- Sparse Mode (PIM-SM)[21] is the most widely used protocol for multicast tree construction in the Internet. In addition, in order to provide inter-domain multicast service, we use Multiprotocol Border Gateway Protocol (MBGP)[22] to communicate multicast path availability and Multicast Source Discovery Protocol (MSDP)[23] to communicate multicast source availability among different domains in the network. Finally, the Internet Group Management Protocol (IGMP)[24] is used by end-hosts to dynamically join and leave multicast groups. As a result, the success of multicast service in the Internet requires successful interoperation of these protocols. </text>

<query_num> 2704 </query_num>
<text> Inter-domain Connectivity/Peering Problems: Another observation is that a number of announcements are only reported by one or a few number of non-local participants. In these cases, announcement originating sites and sdrmonitor participant sites may not be on the same local network, but are topologically close to each other–likely within the same autonomous system (AS). Reachability problems to other domains can be linked either to interdomain peering mis-configurations or more fundamental protocol problems. The limitations of the Multicast Source Discovery Protocol (MSDP)[23] is an example of a possible source of problems.  </text>

</top>
<top>
<paper_num> 28 </paper_num>
<paper_title> Incremental Syntactic Parsing of Natural Language Corpora with Simple Synchrony Networks. </paper_title>
<doi> 10.1.1.43.4494 </doi>
<year> 2001 </year>
<paper_abstract> This article explores the use of Simple Synchrony Networks (SSNs) for learning to parse  English sentences drawn from a corpus of naturally occurring text. Parsing natural  language sentences requires taking a sequence of words and outputting a hierarchical  structure representing how those words fit together to form constituents. Feed-forward  and Simple Recurrent Networks have had great difficulty with this task, in part because  the number of relationships required to specify a structure is too large for the number  of unit outputs they have available. SSNs have the representational power to output  the necessary O(n  2  ) possible structural relationships, because SSNs extend the O(n)  incremental outputs of Simple Recurrent Networks with the O(n) entity outputs provided  by Temporal Synchrony Variable Binding. This article presents an incremental  representation of constituent structures which allows SSNs to make effective use of both  these dimensions. Experiments on learning to ... </paper_abstract>
<query_num> 2801 </query_num>
<text> This article explores the use of Simple Synchrony Networks (SSNs) for learning to parse English sentences drawn from a corpus of naturally occurring text. The SSN has been defined in previous work [17, 23], and is an extension of the Simple Recurrent Network (SRN) [6, 7]. The SSN extends SRNs with Temporal Synchrony Variable Binding (TSVB) [33], which enables the SSN to represent structures and generalise across structural constituents. We apply SSNs to syntactic parsing of natural language as it provides a standard task on real world data which requires a structured output. Parsing natural language sentences requires taking a sequence of words and outputting a hierarchical structure representing how those words fit together to form constituents, such as noun phrases and verb phrases. The state-of-the-art techniques for tackling this task are those from statistical language learning [2, 3, 5, 20]. The basic connectionist approach for learning language is based around the SRN, in which the network is trained to predict the next word in a sentence [7, 8, 30] or else trained to assess whether a sentence is grammatical or not [24, 25]. However, the simple SRN has not produced results comparable with the statistical parsers, because its basic output representation is flat and unstructured. </text>

<query_num> 2802 </query_num>
<text> Apart from models based on SRNs, other forms of connectionist network have been proposed for handling the types of structured information required for language learning. For instance, Hadley and Hayward [13] propose a highly structured network which learns to generalise across syntactic structures in accordance with systematicity [9]. However, this approach is limited due to the amount of non-trainable internal structure required to enforce the appropriate generalisations. As discussed in Section 3.3 (and Henderson [15]), the SSN relies on temporal synchrony to produce a similar effect, which renders the generalisation ability of SSNs largely independent of its specific architectural details. Indeed, in experiments training a type B SSN on the same recursive grammar to that of Hadley and Hayward [13], a similar ability to generalise across syntactic structures was demonstrated [21, 22]. </text>

</top>
<top>
<paper_num> 29 </paper_num>
<paper_title> Pruning and Summarizing the Discovered Associations. </paper_title>
<doi> 10.1.1.43.7539 </doi>
<year> 1999 </year>
<paper_abstract> Association rules are a fundamental class of patterns that exist in data. The key strength of association rule mining is its completeness. It finds all associations in the data that satisfy the user specified minimum support and minimum confidence constraints. This strength, however, comes with a major drawback. It often produces a huge number of associations. This is particularly true for data sets whose attributes are highly correlated. The huge number of associations makes it very difficult, if not impossible, for a human user to analyze in order to identify those interesting/useful ones. In this paper, we propose a novel technique to overcome this problem. The technique first prunes the discovered associations to remove those insignificant associations, and then finds a special subset of the unpruned associations to form a summary of the discovered associations. We call this subset of associations the direction setting (DS) rules  as they set the directions that are followed by the... </paper_abstract>
<query_num> 2901 </query_num>
<text> In subjective interestingness research in data mining, [22, 11, 12, 19] proposed a number of methods for finding unexpected rules. Instead of asking the user to specify what he/she wants to see as in [8], these approaches ask the user to specify his/her existing knowledge about the domain. The system then finds those unexpected rules by comparing the user's knowledge with the discovered rules. Again, these methods do not prune and do not attempt to summarize the association rules. </text>

<query_num> 2902 </query_num>
<text> There are also a number of other methods in classification research for rule pruning such as pessimistic error rate [21] and minimum description length based pruning [15]. In our work, we choose the widely used chisquare test statistics for rule pruning. [1] introduces a technique to remove two types of redundant rules, i.e., simple and strict redundancy. </text>

</top>
<top>
<paper_num> 30 </paper_num>
<paper_title> Logic-Based Knowledge Representation. </paper_title>
<doi> 10.1.1.46.8231 </doi>
<year> 1999 </year>
<paper_abstract> . After a short analysis of the requirements that a knowledge  representation language must satisfy, we introduce Description Logics,  Modal Logics, and Nonmonotonic Logics as formalisms for representing  terminological knowledge, time-dependent or subjective knowledge, and  incomplete knowledge respectively. At the end of each section, we briefly  comment on the connection to Logic Programming.  1 Introduction  This section is concerned with the question under which conditions one may rightfully claim to have represented knowledge about an application domain, and not just stored data occurring in this domain.  1  In the early days of Artificial Intelligence and Knowledge Representation, there was a heated discussion on whether logic can at all be used as a formalism for Knowledge Representation (see e.g. [135, 91, 92]). One aspect of the requirements on knowledge representation formalisms that can be derived from the considerations in this section is very well satisfied by logical for... </paper_abstract>
<query_num> 3001 </query_num>
<text> In [146], a four-valued semantics characterizing the behaviour of a structural subsumption algorithm is introduced. The system Classic [29, 33] employs an "almost" complete structural subsumption algorithm [30]. Its only incompleteness stems from the treatment of individuals inside concept terms, which can, however, again be characterized with the help of a non-standard semantics. Since 1988, a new type of algorithms, so-called tableau-based algorithms , for reasoning in description logics has been developed. Here, the logical point of view is not only used to define the semantics, but also for the design of algorithms, that is, the inference problems are considered as deduction problems in logics. In principle, these algorithms are methods for generating finite models. They can be seen as specializations of the tableau calculus for first-order predicate logic [26, 170]. The non-standard syntax of DL and the restricted expressiveness of these logics allows to design terminating procedures, i.e., for many description languages one obtains decision procedures for the relevant inference problems (see [15] for an introductory exposition of tableau-based inference methods in DL). The first tableau-based subsumption algorithm was developed in [166] for the language ALC, which allows for the first five constructors of Figure 2. Since then, this approach for designing subsumption algorithms was extended to the instance problem [93, 15] and to various description languages extending ALC (see, e.g., [95, 94, 20, 21, 8] for languages with number restrictions; [6] for transitive closure of roles and [159, 99, 101] for transitive roles; [12, 89, 87, 22] for constructs that allow to refer to concrete domains such as numbers; and [10, 40, 8] for the treatment of general axioms of the form C := D, where C; D may both be complex concept terms). </text>

<query_num> 3002 </query_num>
<text> Undecidability and complexity results. Other important research contributions for DL are concerned with the decidability and the complexity of the subsumption problem in different DL languages. It has turned out that the languages used in early DL systems were too expressive, which led to undecidability of the subsumption problem [165, 147]. More recent undecidability results for extensions of ALC can be found in [13, 89, 20, 21, 87, 22]. </text>

<query_num> 3003 </query_num>
<text> 4 Nonmonotonic Logics This research area has also created a huge number of approaches and results, which we cannot describe in detail here. The following is a brief introduction into the existing approaches and the problems treated by these approaches. Overviews of this research area can, for example, be found in [71, 38]. In addition, there are several monographs on the topic [37, 120, 126, 3]. An annotated collection of influential papers in the area can be found in [80]. </text>

</top>
<top>
<paper_num> 31 </paper_num>
<paper_title> Towards an active network architecture. </paper_title>
<doi> 10.1.1.49.9823 </doi>
<year> 2007 </year>
<paper_abstract> Active networks allow users to inject customized programs into the nodes of the network. In this paper, we describe our vision of an active network architecture, outline our approach to its design, and survey the technologies that can be brought to bear on its implementation. In the course of this presentation we identify a number of research questions to be addressed and propose that the research community mount a joint effort to develop and deploy a wide area ActiveNet. 1. INTRODUCTION Traditional data networks passively transport bits from one end system to another. Ideally, the user data is transferred opaquely, i.e., the network is insensitive to the bits it carries and they are transferred between end systems without modification. The role of computation within such networks is extremely limited, e.g., header processing in packet-switched networks and/or signaling in connection-oriented networks. Active Networks break with tradition by allowing the network to perform customized c... </paper_abstract>
<query_num> 3101 </query_num>
<text> Similarly, "nomadic agents" [2] and "gateways" are nodes that support mobility. They are located at strategic points that bridge networks with vastly different bandwidth and reliability characteristics, such as the junctions between wired and wireless networks. Application-neutral work on TCP "snooping" [3] improves the performance of TCP connections by retaining per-connection state information at wireless base stations. Application-specific services performed at gateways include file caching and the transcoding of images [4]. The InfoPad [5] takes the process even further, by instantiating user-specific "pad servers" supporting a range of applications, such as voice and hand-writing recognition, at intermediate nodes. </text>

<query_num> 3102 </query_num>
<text> The preceding section assumed that the validation mechanism has access to information concerning authorizations, e.g., policy-initiated decisions as to the resources that can be made accessible to specific users or applications. We require a mechanism that supports the automated delegation of authorizations, in accordance with a straightforward model that both implementors and administrators can reason about. This issue was previously considered within the context of time-sharing systems [22, 23] but we are not aware of work that addresses delegation in as complex a system as the ActiveNet. Work on the cascading [24] and logic [25] of authentication, which has some of the delegation flavor we are looking for, may provide a starting point for further research. </text>

</top>
<top>
<paper_num> 32 </paper_num>
<paper_title> Detecting application-level failures in component-based Internet services. </paper_title>
<doi> 10.1.1.5.525 </doi>
<year> 2005 </year>
<paper_abstract> Pinpoint is an application-generic framework for using statistical learning techniques to detect and localize likely application-level failures in component-based Internet services. Assuming that most of the system is working most of the time, Pinpoint looks for anomalies in low-level behaviors that are likely to reflect high-level application faults, and correlates these anomalies to their potential causes within the system. In our experiments, Pinpoint correctly detected and localized over 70-88% of the faults, depending on the type of fault, we injected into our testbed system, as compared to the 50-70% detected by current techniques. By demonstrating the applicability of statistical learning and providing an application-generic platform on which additional machine learning techniques can be applied to the problem of fast failure detection, we hope to hasten the adoption of statistical approaches to dependability for complex software systems. </paper_abstract>
<query_num> 3201 </query_num>
<text> A significant part of recovery time (and therefore availability) is the time required to detect and localize service failures. A 2003 study by Business Internet Group of San Francisco (BIG-SF)[7] found that of the 40 top-performing web sites 72% had suffered user-visible failures in common functionality, such as items not being added to a shopping cart or an error message being displayed. These application-level failures, or brown-outs, include failures where only part of the functionality of a site goes down, and failures where functionality is only intermittently unavailable to end-users. Figure 1 shows one brown-out the authors have encountered. Our conversations with Internet service operators confirm that detecting and localizing these failures is a significant problem: one large site estimates that about 93% of the time they spend recovering from application-level failures is spent detecting (75%) and diagnosing them (18%) [12, 2]. Other sites agreed that brown-outs can sometimes take days to detect, though they are usually repaired quickly once found. This situation has a serious effect on the overall reliability of Internet services: a study of three sites found that earlier detection might have mitigated or avoided 65% of reported user-visible failures [31]. </text>

<query_num> 3202 </query_num>
<text> We introduced path-analysis for localizing failures in Internet services in [14], and in [13, 12] broadened our pathanalysis techniques to show how this visibility into a system can aid fault management, fault impact analysis, and evolution management. The primary differentiator between Pinpoint and Chen et al.’s fault diagnosis in [11] is that Chen assumes the pre-existence of labeled data (e.g., failed or successful). In contrast Pinpoint assumes no prior knowledge of faults, and starts with fault detection. While we briefly discussed fault detection using path-shape analysis in Section 4.1.2 of [12], this paper provides the details of the behavior modeling and anomaly detection algorithms for both pathshape analysis and component-interaction analysis, and evaluates when and how these fault detection algorithms work. </text>

<query_num> 3203 </query_num>
<text> The shape of a path is the ordered set of logical software components (as opposed to instances of components on specific machines) used to service a client request [12]. We represent the shape of a path in a call-tree-like structure, except that each node in the tree is a component rather than a call site (i.e., calls that do not cross component boundaries are hidden). We represent the normal behavior of these paths as a probabilistic context-free grammar (PCFG) [29], a structure used in natural language to calculate the probabilities of different sentences being generated by a particular language and the probabilities of different parses of a sentence. A PCFG consists of a set of grammar rules, Rij : N i → ζj , where N i is a symbol in the grammar and ζ j is a sequence of zero or more symbols in the grammar. Each grammar rule is annotated with a probability P (Rij), such that ∀i � j Rij = 1.   </text>

</top>
<top>
<paper_num> 33 </paper_num>
<paper_title> A new paradigm for public key identification. </paper_title>
<doi> 10.1.1.51.9793 </doi>
<year> 1996 </year>
<paper_abstract> The present article investigates the possibility of designing zero-knowledge identification schemes based on hard problems from coding theory. Zero-knowledge proofs were introduced in 1985, in a paper by Goldwasser, Micali and Rackoff ([16]). Their practical significance was soon demonstrated in the work of Fiat and Shamir ([11]), who turned zero-knowledge proofs of quadratic residuosity into efficient means of establishing user identities. In the present paper, we propose a new identification scheme, based on error-correcting codes, which is zero-knowledge and seems of practical value. Furthermore, we describe several variants, including one which has an identity based  character. The security of our schemes depends on the hardness of finding a word of given syndrome and prescribed (small) weight with respect to some randomly generated binary linear error-correcting code. This is, of course, not the first attempt to design a cryptographic scheme using tools from coding theory. The dif... </paper_abstract>
<query_num> 3301 </query_num>
<text> Subsequent research in the area has been aimed at achieving simpler functionalities at a lower cost in terms of computing load. This research has been quite successful in the setting of identification, where a user attempts to convince another entity of his identity by means of an on-line communication. Of course, the transaction should not give enough information to allow anyone else to misrepresent himself as the legitimate user, including the entity carrying the identification process. A major step forward in this area was made with zero-knowledge proofs, introduced in 1985, in a paper by Goldwasser, Micali and Rackoff ([16]) and whose practical significance for public key identification was soon demonstrated in the work of Fiat and Shamir ([11]). Still, zero-knowledge based techniques have continued to rely on number theory, even though the new protocols do not exactly follow the basic public key paradigm invented by Diffie and Hellman and requiring trap-door functions ([8]). Rather, they are based on one-way functions, which is a less stringent requirement and which opens the way to using simpler techniques, more combinatorial in spirit. In 1989, there were two attempts to build identification protocols based on simple operations (see [33, 24]). The first one relied on the intractability of some coding problem but, unfortunately, was not really practical, due to its heavy communication load. The other one was based on the so-called Permuted Kernel problem (PKP). It achieved limited computing time both on the user's side and on the verifier's side and low communication complexity. Furthermore, it was well suited to the environment of small portable devices using 8 bit microprocessors. </text>

</top>
<top>
<paper_num> 34 </paper_num>
<paper_title> Blobworld: A System for Region-Based Image Indexing and Retrieval. </paper_title>
<doi> 10.1.1.52.5804 </doi>
<year> 1999 </year>
<paper_abstract> Blobworld is a system for image retrieval based on finding coherent image regions which roughly correspond to objects. The image is segmented into regions by fitting a mixture of Gaussians to the pixel distribution in a joint color-texture-position feature space. Each region ("blob") is then associated with color and texture descriptors. Querying is based on the user specifying attributes of one or two regions of interest, rather than a description of the entire image. In order to make largescale retrieval feasible, we index the blob descriptions using a tree. Because indexing in the high-dimensional feature space is computationally prohibitive, we use a lower-rank approximation to the high-dimensional distance. Experiments show encouraging results for both querying and indexing. </paper_abstract>
<query_num> 3401 </query_num>
<text> Many current image retrieval systems perform retrieval based primarily on lowlevel image features, including IBM's Query by Image Content (QBIC) [6], Photobook [19], Virage [9], VisualSEEk [23], Candid [15], and Chabot [18]. Lipson et al. [16] retrieve images based on spatial and photometric relationships within and across simple image regions. Little or no segmentation is done; the regions are derived from low-resolution images. Jacobs et al. [13] use multiresolution wavelet decompositions to perform queries based on iconic matching. Ma and Manjunath [17] perform retrieval based on segmented image regions. Their segmentation is not fully automatic, as it requires some parameter tuning and hand pruning of regions. </text>

<query_num> 3402 </query_num>
<text> Much research has gone into dimensionality reduction [10] and new index trees [22, 24] to cope with the high dimensionality of indices built over color histograms. Work to date has focused on indexing the entire image or userdefined sub-regions, not on indexing automatically created image regions. Our indexing methods are based on those used in QBIC [10]. </text>

<query_num> 3403 </query_num>
<text> Each pixel is assigned a vector consisting of color, texture, and position features. The three color features are the coordinates in the L*a*b* color space [25]; we smooth these features to avoid oversegmentation arising from local color variations due to texture. The three texture features are contrast, anisotropy, and polarity, extracted at a scale which is selected automatically [2]. The position features are simply the (x; y) position of the pixel; including the position generally decreases oversegmentation and leads to smoother regions. We model the distribution of pixels in this 8-D space using mixtures of two to five Gaussians. We use the Expectation-Maximization algorithm [4] to fit the mixture of Gaussians model to the data. To choose the number of Gaussians that best suits the natural number of groups present in the image, we apply the Minimum Description Length (MDL) principle [20, 21]. Once a model is selected, we perform spatial grouping of connected pixels belonging to the same color/texture cluster.  </text>

</top>
<top>
<paper_num> 35 </paper_num>
<paper_title> Chosen-Ciphertext Secure Identity-Based Encryption in the Standard Model with short Ciphertexts. </paper_title>
<doi> 10.1.1.60.6456 </doi>
<year> 2006 </year>
<paper_abstract> We describe a practical identity-based encryption scheme that is secure in the standard model  against chosen-ciphertext (CCA2) attacks. Security is based on an assumption comparable to (but  slightly stronger than) Bilinear Decisonal Di#e-Hellman (BDDH). A comparison shows that our  construction outperforms all known identity-based encryption schemes in the standard model and  its performance is even comparable with the one from the random-oracle based Boneh/Franklin  IBE scheme. Our proposed IBE scheme has furthermore the property that it fulfills some notion of  "redundancy-freeness", i.e. the encryption algorithm is not only a probabilistic injection but also a  surjection. As a consequence the ciphertext overhead is nearly optimal: to encrypt k bit messages  for k bit identities and with k bit randomness we get 3k bit ciphertexts to guarantee (roughly) k  bits of security. </paper_abstract>
<query_num> 3501 </query_num>
<text> After Shamir proposed the concept of IBE in 1984 [38] it remained an open problem for almost two decades to come up with a satisfying construction for it. In 2001, Boneh and Franklin [8] proposed formal security notions for IBE systems and designed a fully functional secure IBE scheme using bilinear maps. This scheme and the tools developed in its design have been successfully applied in numerous cryptographic settings, transcending by far the identity based cryptography framework. IBE is currently in the process of getting standardized — from February 2006 on the new IEEE P1363.3 standard for “Identity-Based Cryptographic Techniques using Pairings” [25] accepts submissions. An alternative but less efficient IBE construction was proposed by Cocks [16] based on quadratic residues. Both IBE schemes (Cocks’ scheme only through Fujisaki-Okamoto [18]) provide security against chosen-ciphertext attacks. In a chosen ciphertext attack, the adversary is given access to a decryption oracle that allows him to obtain the decryptions of ciphertexts of his choosing. Intuitively, security in this setting means that an adversary obtains (effectively) no information about encrypted messages, provided the corresponding ciphertexts are never submitted to the decryption oracle. For different reasons, the notion of chosen-ciphertext security has emerged as the “right” notion of security for encryption schemes. We stress that, in general, chosen-ciphertext security is a much stronger security requirement than chosen-plaintext attacks [1], where in the latter an attacker is not given access to the decryption oracle. </text>

<query_num> 3502 </query_num>
<text> As we already pointed out, chosen-ciphertext secure IBE scheme were known to exist using generic reductions [12] based on Waters’ 2-level HIBE [40]. Recently the first direct chosen-ciphertext secure IBE scheme was constructed in [26]. </text>

<query_num> 3503 </query_num>
<text> The modes of operation CMC [22] and EME [23] both give chosen-ciphertext secure DEMs provided that the underlying block-cipher is a strong pseudorandom permutation and avoid the usual overhead due to the MAC. </text>

</top>
<top>
<paper_num> 36 </paper_num>
<paper_title> An Image Morphing Technique Based on Optimal Mass Preserving Mapping. </paper_title>
<doi> 10.1.1.62.2480 </doi>
<year> 2007 </year>
<paper_abstract> Abstract—Image morphing, or image interpolation in the time domain, deals with the metamorphosis of one image into another. In this paper, a new class of image morphing algorithms is proposed based on the theory of optimal mass transport. The P mass moving energy functional is modified by adding an intensity penalizing term, in order to reduce the undesired double exposure effect. It is an intensity-based approach and, thus, is parameter free. The optimal warping function is computed using an iterative gradient descent approach. This proposed morphing method is also extended to doubly connected domains using a harmonic parameterization technique, along with finite-element methods. Index Terms—Image interpolation, image morphing, image warping, mass preserving mapping, Monge–Kantorovich flow, optimal transport. I. </paper_abstract>
<query_num> 3601 </query_num>
<text> Optimal mass transport problem was first formulated by Gaspar Monge in 1781. It concerned finding an optimal way, in the sense of minimal transportation cost, of moving a pile of soil from one site to another. This problem was given a modern formulation by Kantorovich in 1948 [19], and is also known as the Monge–Kantorovich problem (MKP). The optimal mass transport problem has been extensively studied in the fields of econometrics, fluid dynamics, automatic control, transportation, statistical physics, shape optimization, expert systems, and meteorology [26]. Recently, this problem has been studied within the context of content-based image retrieval [22], [27], [28]. Pixels in an image are divided into several bins according to their color and spatial locations. The Earth Mover’s distance (EMD) is then calculated between the bins of two images and used for image retrieval. </text>

<query_num> 3602 </query_num>
<text> We have also proposed an MKP-based image registration algorithm [14], in which a pseudo-mass (intensity-weighted area) is preserved. Our algorithm can handle area preserving registration [12] naturally by simply assigning the pseudo mass density to unity on the entire domain of the image. Other approaches to using various classes of diffeomorphisms for registration and warping may be found for example in [25], [31], [32], and the references therein. In this paper, we present an improved approach for image morphing, with special efforts taken to reduce the double exposure effect (also referred to as the “fade-in and fade-out” effect). An improved three-step gradient descent approach (as opposed to the two-step approach used in [14] and [40]) is employed here for rectangular domains. We also explain how to extend this approach to irregular multiconnected domains. </text>

</top>
<top>
<paper_num> 37 </paper_num>
<paper_title> Probability Product Kernels. </paper_title>
<doi> 10.1.1.67.5993 </doi>
<year> 2004 </year>
<paper_abstract> The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product is then evaluated as the integral of the product of pairs of distributions. The kernel is straightforward to evaluate for all exponential family models such as multinomials and Gaussians and yields interesting nonlinear kernels. Furthermore, the kernel is computable in closed form for latent distributions such as mixture models, hidden Markov models and linear dynamical systems. For intractable models, such as switching linear dynamical systems, structured mean-field approximations can be brought to bear on the kernel evaluation. For general distributions, even if an analytic expression for the kernel is not feasible, we show a straightforward sampling method to evaluate it. Thus, the kernel permits discriminative learning methods, including support vector machines, to exploit the properties, metrics and invariances of the generative models we infer from each datum. Experiments are shown using multinomial models for text, hidden Markov models for biological data sets and linear dynamical systems for time series data. </paper_abstract>
<query_num> 3701 </query_num>
<text> One goal of this article is to explore a contact point between discriminative learning (supportvector machines and kernels) and generative learning (distributions and graphical models). Discriminative learning directly optimizes performance for a given classification or regression task.Meanwhile, generative learning provides a rich palette of tools for exploring models, accommodating unusual input spaces and handling priors. One approach to marrying the two is to estimateparameters in generative models with discriminative learning algorithms and optimize performance on a particular task. Examples of such approaches include conditional learning (Bengio and Fras-coni, 1996) or large margin generative modeling (Jaakkola et al., 1999). Another approach is to use kernels to integrate the generative models within a discriminative learning paradigm. The pro-posed probability product kernel falls into this latter category. Previous efforts to build kernels that accommodate probability distributions include the Fisher kernel (Jaakkola and Haussler, 1998),the heat kernel (Lafferty and Lebanon, 2002) and kernels arising from exponentiating KullbackLeibler divergences (Moreno et al., 2004). We discuss, compare and contrast these approaches tothe probability product kernel in Section 7. One compelling feature of the new kernel is that it is straightforward and efficient to compute over a wide range of distributions and generative mod-els while still producing interesting nonlinear behavior in, for example, support vector machine (SVM) classifiers. The generative models we consider include Gaussians, multinomials, the ex-ponential family, mixture models, hidden Markov models, a wide range of graphical models and (via sampling and mean-field methods) intractable graphical models. The flexibility in choosing adistribution from the wide range of generative models in the field permits the kernel to readily accommodate many interesting input spaces including sequences, counts, graphs, fields and so forth.It also inherits the properties, priors and invariances that may be straightforward to design within a generative modeling paradigm and propagates them into the kernel learning machine. This articlethus gives a generative modeling route to kernel design to complement the family of other kernel engineering methods including super-kernels (Ong et al., 2002), convolutional kernels (Haussler,1999; Collins and Duffy, 2002), alignment kernels (Watkins, 2000), rational kernels (Cortes et al., 2002) and string kernels (Leslie et al., 2002; Vishawanathan and Smola, 2002). </text>

<query_num> 3702 </query_num>
<text> This type of similarity measure is not unknown in the literature. In text recognition, for exam-ple, the so-called "cosine-similarity" (i.e. the dot product measure) is well entrenched, and it has been observed that preprocessing word counts by taking square roots often improves performance(Goldzmidt and Sahami, 1998; Cutting et al., 1992). Lafferty and Lebanon also arrive at a similar kernel, but from a very different angle, looking at the diffusion kernel on the statistical manifold ofmultinomial distributions (Lafferty and Lebanon, 2002). </text>

<query_num> 3703 </query_num>
<text> This result confirms previous intuitions in thetext classification community which suggest using squashing functions on word frequencies such as the logarithm or the square root (Goldzmidt and Sahami, 1998; Cutting et al., 1992). This alsorelated to the power-transformation used in statistics known as the Box-Cox transformation which may help make data seem more Gaussian (Davidson and MacKinnon, 1993). 8.2 Biological Sequence Classification For discrete sequences, natural generative models to consider are Markov models and hidden Markov models.  </text>

</top>
<top>
<paper_num> 38 </paper_num>
<paper_title> Cutoff rate optimal binary inputs with imperfect CSI. </paper_title>
<doi> 10.1.1.67.8602 </doi>
<year> 2006 </year>
<paper_abstract> Abstract — We use the cutoff rate to study the optimal binary input distributions for the Rayleigh flat-fading channel with imperfect receiver channel state information (CSI). First, we evaluate the cutoff rate and analyze the optimal binary input as a function of the CSI quality and receiver SNR. Next, we study the limiting distributions – BPSK and On-Off Keying (OOK) – and derive an analytic design rule that allows adaptive switching between these two as the receiver CSI changes. We establish the virtues of a modulation scheme that employs only these limiting distributions, rather than the full spectrum of binary inputs. Finally, we use our results to design an adaptive modulation scheme for Pilot Symbol Assisted Modulation systems. We show that switching between just BPSK and equiprobable-OOK is nearly optimal for moderate to large SNR, and that switching between BPSK and generalized-OOK is nearly optimal for all SNR. Index Terms — Adaptive modulation, cutoff rate, correlated </paper_abstract>
<query_num> 3801 </query_num>
<text> We use the cutoff rate Ro to characterize reliable rates. Cutoff rate analysis has been common since its reintroduction in [26], and studies have been conducted for perfect receiver CSI in [21] (independent fading), [23] and [27] (temporally correlated fading), and for no CSI multiple-input multipleoutput (MIMO) systems in [18]. The cutoff rate is a lower bound on capacity that also provides a bound on the random coding exponent (thereby characterizing the entire rate vs. performance curve) via Pe ≤ 2−N(Ro−R) ,whereRis the rate, and Pe the probability of decoding error for length N codewords [13],[30, Sections I and II]. Although certain encoding-decoding structures can achieve rates greater than Ro (e.g., turbo coding with iterative decoding), the cutoff rate remains a metric of interest for these systems, as well as others [6]. For example, in sequential decoding, the cutoff rate specifies the largest rate for which decoding complexity remains finite [4]. The cutoff rate often leads to a tractable analysis that often would not be possible through direct evaluation of the random coding exponent or the capacity. </text>

<query_num> 3802 </query_num>
<text> In PSAM [8], [32], known pilot symbols are multiplexed with data symbols for transmission through the communications channel. At the receiver, knowledge of these pilots is used to form channel estimates, which aid the detection of the data both directly (by modifying the detection rule based on the channel estimate) and indirectly (e.g., by allowing for estimate-directed modulation, power control, and media access). In general, there is no guarantee that PSAM-based approaches are optimal, and PSAM has been shown to be suboptimal when the channel coherence time is small and/or the SNR small from various perspectives [17], [25], [2], [11]. Nevertheless, the technique is of great practical significance. In addition to providing implementable receiver structures, PSAM facilitates accurate timing and synchronization. PSAM has been incorporated into many commercial and Military standards, and optimized approaches to PSAM have been studied from the perspectives of frequency and timing offset estimation [22], [14], bit-error rate (BER) [8], [11], [7], [12], and the channel capacity or its bounds [17], [3], [24], [29]. </text>

</top>
<top>
<paper_num> 39 </paper_num>
<paper_title> Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition. </paper_title>
<doi> 10.1.1.69.7158 </doi>
<year> 2007 </year>
<paper_abstract> Abstract. We propose a new method for human action recognition from video sequences using latent topic models. Video sequences are represented by a novel “bag-of-words ” representation, where each frame corresponds to a “word”. The major difference between our model and previous latent topic models for recognition problems in computer vision is that, our model is trained in a “semi-supervised ” way. Our model has several advantages over other similar models. First of all, the training is much easier due to the decoupling of the model parameters. Secondly, it naturally solves the problem of how to choose the appropriate number of latent topics. Thirdly, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification and irregularity detection results, and show improvement over previous methods. 1 </paper_abstract>
<query_num> 3901 </query_num>
<text> Recognizing human actions from image sequences is a challenging problem in computer vision. It has applications in many areas, e.g., motion capture, medical bio-mechanical analysis, ergonomic analysis, human-computer interaction, surveillance and security, environmental control and monitoring, sport and entertainment analysis, etc. Various visual cues (e.g., motion [6, 8,16,20] and shape [26]) can be used for recognizing actions. In this paper, we focus on recognizing the action of a person in an image sequence based on motion cues. We develop a novel model of human actions based on the “bag-of-words” paradigm. Our model is motivated by the recent success of “bag-of-words” representation for object recognition problems in computer vision. The common paradigm of these approaches consists of extracting local features from a collection of images, constructing a codebook of visual words by vector quantization, and building a probabilistic model to represent the collection of visual words. While these models of an object as a collection of local parts are certainly not “correct” ones, for example they only model a few parts of objects and often ignore much structure, they have been demonstrated to be quite effective in object recognition tasks [9,12,15]. </text>

<query_num> 3902 </query_num>
<text> In particular, our model is based on the latent Dirichlet allocation (LDA) [2] model. LDA, the probabilistic Latent Semantic Indexing (pLSI) [13] model, and their variants have been applied to various computer vision applications, such as scene recognition [5, 10], object recognition [11,22,25], action recognition [19], human detection [1], etc. </text>

<query_num> 3903 </query_num>
<text> Our approach is closely related to a body of work on recognition using “bagof-words”. The “bag-of-words” model was originally proposed for analyzing text documents [2,13]. Recently, researchers in the computer vision community have used “bag-of-words” models for various recognition problems. Fei-Fei & Perona [10] use a variant of LDA for natural scene categorization. Sivic et al. [25], Fergus et al. [11] and Russell et al. [22] use pLSI for unsupervised object class recognition and segmentation. Niebles et al. [19] use pLSI for action recognition using spatial-temporal visual words. Bissacco et al. [1] use LDA for human pose classification from vector-quantized words from histograms of oriented gradients. </text>

</top>
<top>
<paper_num> 40 </paper_num>
<paper_title> A Theory of Shape by Space Carving. </paper_title>
<doi> 10.1.1.70.9715 </doi>
<year> 2000 </year>
<paper_abstract> In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple color photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the maximal photo-consistent shape, that (1) can be computed from an arbitrary volume that contains the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present experimental results from applying it to the reconstruction of geometrically-complex scenes from several photographs. The approach is specifically designed to (1) build 3D shapes that allow faithful reproduction of all input photographs, (2) resolve the complex interactions between occlusion, parallax, shading, and their effects on arbitrary collections of photographs of a scene, and (3) follow a “least commitment ” approach to 3D shape recovery. 1 1 </paper_abstract>
<query_num> 4001 </query_num>
<text> A fundamental problem in computer vision is reconstructing the shape of a complex 3D scene from multiple photographs. While current techniques work well under controlled conditions (e.g., small stereo baselines [1], active viewpoint control [2], spatial and temporal smoothness [3–5], or scenes containing curved lines [6], planes [7], or texture-less surfaces [8–12]), very little is known about scene reconstruction under general conditions. In particular, in the absence of apriorigeometric information, what can we infer about the structure of an unknown scene from N arbitrarily positioned cameras at known viewpoints? Answering this question has especially important implications for reconstructing real objects and environments, which often tend to be non-smooth, exhibit significant occlusions, and may contain both strongly-textured as well as texture-less surface regions (Fig. 1). </text>

<query_num> 4002 </query_num>
<text> Central to our analysis is the realization that parallax, occlusion, and scene radiance all contribute to a photograph’s ultimate dependence on viewpoint. Since our notion of photo-consistency implicitly ensures that all these 3D shape cues are taken into account in the recovery process, our approach is related to work on stereo [1, 14, 20], shape-from-contour [8, 9, 21], as well as shape-from-shading [22–24]. These approaches rely on studying a single 3D shape cue under the assumptions that (1) other sources of variability can be safely ignored, and (2) the input photographs contain features relevant to that cue [25]. 1 Unfortunately, these approaches cannot be easily generalized to attack the N-view reconstruction problem for arbitrary 3D scenes because neither assumption holds true in general. Implicit in this previous work is the view that untangling parallax, self-occlusion and shading effects in N arbitrary photographs of a scene leads to a problem that is either under-constrained or intractable. Here we challenge this view by showing that shape recovery from N arbitrary photographs of an unknown scene is not only a tractable problem but has a simple solution as well. </text>

<query_num> 4003 </query_num>
<text> Because no constraints on the camera viewpoints are imposed, our approach leads naturally to global reconstruction algorithms [12, 37] that recover 3D shape information from all photographs in a single step. This eliminates the need for complex partial reconstruction and merging operations [38, 39] in which partial 3D shape information is extracted from subsets of the photographs [32, 40–42], and where global consistency with the entire set of photographs is not guaranteed for the final shape. </text>

<query_num> 4004 </query_num>
<text> The remainder of this paper is structured as follows. Section 2 analyzes the constraints that a set of photographs place on scene structure given a known, locally-computable model of scene radiance. Using these constraints, a theory of photo-consistency is developed that provides a basis for characterizing the space of all reconstructions of a scene. Sections 3 and 4 then use this theory to present the two central results of the paper, namely the existence of the maximal photo-consistent shape and the development of a provably-correct algorithm called Space Carving that computes it. Section 4.1 then presents a discrete implementation of the Space Carving Algorithm that iteratively “carves” out the scene from an initial set of voxels. This implementation can be seen as a generalization of silhouette-based techniques like volume intersection [21, 44, 56, 57] to the case of gray-scale and full-color images, and extends voxel coloring [29] and plenoptic decomposition [30] to the case of arbitrary camera geometries. Section 5 concludes with experimental results on real and synthetic images. </text>

</top>
<top>
<paper_num> 41 </paper_num>
<paper_title> Simultaneous Feature Selection and Clustering Using Mixture Models. </paper_title>
<doi> 10.1.1.71.1764 </doi>
<year> 2004 </year>
<paper_abstract> Abstract—Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters. Index Terms—Feature selection, clustering, unsupervised learning, mixture models, minimum message length, EM algorithm. 1 </paper_abstract>
<query_num> 4101 </query_num>
<text> THE goal of clustering is to discover a “natural” grouping in a set of patterns, points, or objects, without knowledge of any class labels. Clustering, or cluster analysis, is prevalent in any discipline that involves analysis of multivariate data. It is, of course, impractical to exhaustively list the numerous uses of clustering techniques. Image segmentation, an important problem in computer vision, can be formulated as a clustering problem [21], [28], [55]. Documents can be clustered [23] to generate topical hierarchies for information access [53] or retrieval [5]. Clustering is also used to perform market segmentation [2], [11] as well as in biology, e.g., to study genome data [3]. </text>

<query_num> 4102 </query_num>
<text> Most of the literature on feature selection pertains to supervised learning (both classification [24] and regression [40]). Feature selection algorithms can be broadly divided into two categories [7], [33]: filters and wrappers. The filter approaches evaluate the relevance of each feature (subset) using the data set alone, regardless of the subsequent learning algorithm. RELIEF [32] and its enhancement [36] are representatives of this class, where the basic idea is to assign feature weights based on the consistency of the feature value in the k nearest neighbors of every data point. Informationtheoretic methods are also used to evaluate features: the mutual information between a relevant feature and the class labels should be high [4]. Nonparametric methods can be used to compute mutual information involving continuous features [37]. A feature can be regarded as irrelevant if it is conditionally independent of the class labels given other features. The concept of Markov blanket is used to formalize this notion of irrelevancy in [34]. </text>

<query_num> 4103 </query_num>
<text> Both approaches, filters and wrappers, usually involve combinatorial searches through the space of possible feature subsets; for this task, different types of heuristics, such as sequential forward or backward searches, floating search, beam search, bidirectional search, and genetic search have been suggested [9], [33], [47], [63]. It is also possible to construct a set of weak (in the boosting sense [20]) classifiers, with each one using only one feature, and then apply boosting, which effectively performs feature selection [59]. It has also been proposed to approach feature selection using rough set theory [35]. </text>

</top>
<top>
<paper_num> 42 </paper_num>
<paper_title> Toward Picture-perfect Streaming on the Internet. </paper_title>
<doi> 10.1.1.75.9758 </doi>
<year> 2005 </year>
<paper_abstract> Quality of service (QoS) in streaming of continuous media over the Internet is poor, which is partly due to variations in delays, bandwidth limitations, and packet losses. Although continuous media applications can tolerate some missing data, non-recoverable information loss degrades these applications ’ QoS. Consequently, a number of application areas have backed away from streaming of their content over the Internet. Inability to control the resulting visual and auditory quality of the streamed presentation is an important reason for such a trend. We believe that this trend can be reversed. To this end, this paper gives an overview of our efforts in exploring high quality streaming through the exploitation of multiple paths existing in the network. By high quality, we mean with significant bandwidth requirements, of relatively long duration, and without information loss or hiccups in data delivery. We believe this to be a promising approach. 1 </paper_abstract>
<query_num> 4201 </query_num>
<text> Given the above stated benefits, one important question that remains is how to distribute the load among multiple paths so as to optimize viewing quality. In [20] Golubchik et al. study MP streaming benefits using a round-robin (RR) approach to split the CM traffic among the multiple paths. </text>

<query_num> 4202 </query_num>
<text> Sophisticated media coding techniques (e.g., MD coding) have been developed for use with multi-path streaming [4, 8, 6, 9, 5, 24, 30, 13, 14, 16, 15]. The basic idea is to partition the media into multiple roughly equally important independently decodeable bitstreams (descriptions). Each of them contains complementary information and is sent through different paths, such that media quality can be improved as the number of received descriptions increases. </text>

<query_num> 4203 </query_num>
<text> In this context, issues such as dealing with heterogeneous path bandwidth constraints [8], rate-distortion optimization [24, 13, 14], coding efficiency, adaptation schemes [30, 16], etc. are explored. Specifically, in [6, 9], a model based on the conventional GM is also used in the analysis and path selection for MD coded multi-path streaming. In our work, we focus on a more general study of multi-path streaming without the use of a specific coding technique - an advantage being that it can then be made to work well with many coding techniques or media codecs. Also, our work adopts a more expressive model for characterizing the loss characteristics of paths between senders and a receiver. Developing an adaptation scheme which reacts to time-varying path characteristics is our ongoing effort. </text>

</top>
<top>
<paper_num> 43 </paper_num>
<paper_title> Optimal pricing for multiple services in telecommunications networks offering quality-of-service guarantees. </paper_title>
<doi> 10.1.1.85.9303 </doi>
<year> 2003 </year>
<paper_abstract> Abstract — We consider pricing for multiple services offered over a single telecommunications network. Each service has quality of service (QoS) requirements that are guaranteed to users. Service classes may be defined by the type of service, such as voice, video or data, as well as the origin and destination of the connection provided to the user. We formulate the optimal pricing problem as a nonlinear integer expected revenue optimization problem. We simultaneously solve for prices and the resource allocations necessary to provide connections with guaranteed QoS. We derive optimality conditions and a solution method for this class of problems, and apply to a realistic model of a multi-service communications network. </paper_abstract>
<query_num> 4301 </query_num>
<text> The most celebrated packet-switched network, the Internet, offers best effort service, which is prone to unpredictable congestion and delays by definition. The current flow control scheme in the internet is called transmission control protocol (TCP) [2] [34]. Modifications to this scheme have been suggested that would allow multiple service classes, and induce users to behave fairly and efficiently, through simple or randomized packet marking mechanisms in the network [10][13] [14] [20]. Usage based schemes, which charge based on the actual resources used have also been proposed for these types of networks [8] [29]. </text>

<query_num> 4302 </query_num>
<text> Priority pricing is another suggestion for allowing multiple services over best effort networks. The most well known work discusses a second bid auction, whereby it is incentive compatible, or in the users best interests to truthfully reveal their true valuation of service in terms of a priority [22][23][24]. Another bidding paradigm, which results in a Nash equilibrium among users was suggested for the Internet in [5]. A similar approach, again requiring users to bid for service, was proposed for available bit rate service, the besteffort service offered in asynchronous transfer mode (ATM) networks [6]. </text>

<query_num> 4303 </query_num>
<text> An in-depth analysis of equivalent capacity is outside the focus of this paper. Detailed discussions of buffering and multiplexing gains are given in [2][9][28][32] and [34]. </text>

</top>
<top>
<paper_num> 44 </paper_num>
<paper_title> Local Discriminant Wavelet Packet Coordinates for Face Recognition. </paper_title>
<doi> 10.1.1.86.2469 </doi>
<year> 2007 </year>
<paper_abstract> Face recognition is a challenging problem due to variations in pose, illumination, and expression. Techniques that can provide effective feature representation with enhanced discriminability are crucial. Wavelets have played an important role in image processing for its ability to capture localized spatial-frequency information of images. In this paper, we propose a novel local discriminant coordinates method based on wavelet packet for face recognition to compensate for these variations. Traditional wavelet-based methods for face recognition select or operate on the most discriminant subband, and neglect the scattered characteristic of discriminant features. The proposed method selects the most discriminant coordinates uniformly from all spatial frequency subbands to overcome the deficiency of traditional wavelet-based methods. To measure the discriminability of coordinates, a new dilation invariant entropy and a maximum a posterior logistic model are put forward. Moreover, a new triangle square ratio criterion is used to improve classification using the Euclidean distance and the cosine criterion. Experimental results show that the proposed method is robust for face recognition under variations in illumination, pose and expression. </paper_abstract>
<query_num> 4401 </query_num>
<text> Face recognition (Zhao et al., 2003; Jain et al., 2004) has become one of the most active research areas in pattern recognition. It plays an important role in many application areas, such as humanmachine interaction, authentication and surveillance. However, the wide-range variations of human face, due to pose, illumination, and expression, result in a highly complex distribution and deteriorate the recognition rate. It seems impractical to collect sufficient prototype images covering all the possible variations. Therefore, how to construct a small-size-training face recognizer robust to environmental variations is a challenging research issue. Wavelets have been successfully used in image processing. Their ability to capture localized spatial-frequency information of image motivates us to use them for feature extraction. In this study, we investigate a new approach by extracting the features not sensitive to environmental changes from a wavelet packet dictionary. </text>

<query_num> 4402 </query_num>
<text> Generally, feature extraction, discriminant analysis and classifying criterion are the three basic elements of a face recognition system. The performance and robustness of face recognition could be enhanced by improving these elements. Feature extraction in the sense of some linear or nonlinear transform of the data with subsequent feature selection is commonly used for reducing the dimensionality of facial image so that the extracted features are as representative as possible. A lot of work on face recognition has been carried out based on similarities analysis (P. Howland and Park, 2006; Belhumeur et al., 1997; Jiang et al., 2006; Martinez and Zhu, 2005; Vaswani and Chellappa, 2006; Xiang et al., 2006; Zhao et al., 2003). A well-known feature extraction method is called FisherFace, based on linear discriminant analysis (LDA), which linearly projects the image space to a low-dimensional subspace so as to discount environmental variations (Belhumeur et al., 1997; Fukunaga, 1990). This method is a statistical linear projection method which largely relies on the representation of the training samples. On the other hand, wavelet-based methods with no special focus on the training data have been used for feature extraction (Mallat, 1989; Coifman et al., 1992). The decomposition of the data into different frequency ranges allows us to isolate the frequency components introduced by intrinsic deformations due to expression or extrinsic factors (like illumination) into certain subbands. Wavelet-based methods prune away these variable subbands, and focus on the subbands that contain the most relevant information to better represent the data. WaveletFace (Chien and Wu, 2002) only uses the low-frequency subband to present the basic figure of an image, and ignores the efficacy of high-frequency components. Our previous study (Dai and Yuen, 2006) uses a wavelet enhanced regularized discriminant analysis after dimensionality reducing with low-pass filter to solve the small sample size problem, which is also a method based on the low frequency subband. Similarly, some other studies (Feng et al., 2000; Ekenel and Sanker, 2005; Zhang et al., 2004, 2005) employ the traditional transform (e.g., ICA, PCA, Neural Networks) to enhance the discriminant power in one or several special subbands, the latter always fuse the discriminant power in these different subbands for final classification (Ekenel and Sanker, 2005; Zhang et al., 2005). Moreover, as a generalization of the wavelet transform, the wavelet packet not only offers an attractive tool for reducing the dimensionality by feature extraction, but also allows us to create localized subbands of the data in both space and frequency domains. Saito and Coifman introduced the local discriminant basis (LDB) algorithm based on a best-basis paradigm to search for the most discriminant subbands (basis) that illuminates the dissimilarities among classes from the wavelet packet dictionary (Coifman and Saito, 1994; Saito and Coifman, 1994, 1995). Some studies (Saito et al., 2002; Stranss et al., 2003) constructed the modified LDB later. In Kouzani et al. (1997), the best-basis algorithm of Coifman and Wicherhauser (1992) is used to search for the wavelet packet basis for face representation. In Bhagavatula and Savvides (2005), PCA is performed in wavelet packet subbands and the subbands which generalize better across illumination variations for face recognition are sought. All the methods on these studies are based on the whole discriminant subband. </text>

<query_num> 4403 </query_num>
<text> However, the LDA algorithm often suffers from the “small sample size” (SSS) problem which exists in high-dimensional pattern recognition tasks, where the number of available samples is smaller than the dimensionality of the samples. Many methods (Mika et al., 1999; Baudat and Anouar, 2000; S. Mika and Müller, 2003; Yang, 2002) discard the discriminant information contained in the null space of Sw. But a significant result is a finding that there exists crucial discriminative information in the null space of Sw (Chen et al., 2000; Zhuang and Dai, 2007; Yang and Yang, 2003; Yu and Yang, 2001). We proposed the use of regularization (Dai and Yuen, 2003), but it involves a determination of parameters. Yang et al. (2005) proposed a complete kernel Fisher discriminant analysis algorithm which makes full use of two kinds of discriminant information, regular and irregular in kernel feature space. Its advantage is that no estimation of parameter is needed. Based on their idea, we use complete linear discriminant analysis (CLDA) in the LDC algorithm to solve the SSS problem. </text>

</top>
<top>
<paper_num> 45 </paper_num>
<paper_title> Building a reusable test collection for question answering. </paper_title>
<doi> 10.1.1.89.8867 </doi>
<year> 2006 </year>
<paper_abstract> In contrast to traditional information retrieval systems, which return ranked lists of documents that users must manually browse through, a question answering system attempts to directly answer natural language questions posed by the user. Although such systems possess language processing capabilities, they still rely on traditional document retrieval techniques to generate an initial candidate set of documents. In this paper, we argue that document retrieval for question answering represents a different task than retrieving documents in response to more general retrospective information needs. Thus, to guide future system development, specialized question answering test collections must be constructed. We have shown that the current evaluation resources have major shortcomings, and to remedy the situation, we have manually created a small, reusable question answering test collection for research purposes. This article describes our methodology for building this test collection and discusses issues we encountered along the way regarding the notion of “answer correctness”. 1. </paper_abstract>
<query_num> 4501 </query_num>
<text> Functionally, most factoid question answering systems today can be decomposed into four major components (see Figure 1): question analysis, document retrieval, passage retrieval, and answer extraction (cf. Hirschman and Gaizauskas, 2001; Voorhees, 2001). The question analysis component classifies user questions into the expected semantic type of the answer. Typical approaches include the use of heuristic rules (Hovy et al. 2001) and machine learning techniques (Li and Roth, 2002), both of which may refer to a custom question type hierarchy or existing resources such as WordNet (Harabagiu et al., 2000). As an example, the expected answer type of the question “Where was Kennedy assassinated?” is location. The question analysis module is often also responsible for formulating one or more queries to a document retriever; these queries are used to fetch a set of potentially relevant documents from the corpus. From these documents, the passage retrieval component selects a handful of paragraph-sized fragments for subsequent analysis. Most often, passage retrieval algorithms perform a density-based weighting of query terms, i.e., they favor query terms that appear close together (see Tellex et al., 2003 for a survey). The insight is that answers to a question are likely to occur in extents containing many closely-clustered query terms while documents containing all query terms spaced far apart are less likely to contain the answer. In some systems, document and passage retrieval are performed simultaneously (e.g., Clarke et al., 2000). Finally, the answer extraction module searches the passages for the actual answers. The basic strategy is to find named entities that match the expected answer type (Srihari and Li, 1999), although Light et al. (2001) has shown this method to be insufficient. Beyond simple matching of named entities, answer extractors may also employ more advanced linguistic processing technology, such as matching syntactic relations from the questions with those from the corpus (Katz and Lin, 2003) or attempting to “justify” the answer using an abductive proof (Harabagiu et al., 2000). In general, knowledge-based approaches (e.g., Prager et al., 2000), Web-based techniques (e.g., Brill et al., 2001), and statistical methods (e.g., Echihabi and Marcu, 2003) are well represented in question answering systems. </text>

<query_num> 4502 </query_num>
<text> 3. The TREC Question Answering Tracks Over the past few years, the question answering tracks at the Text Retrieval Conferences (TRECs) (Voorhees and Tice, 1999, 2000, 2000a; Voorhees, 2001, 2002, 2003), sponsored by the National Institute of Standards and Technology (NIST), have brought formal and rigorous evaluation methodologies to bear on the question answering task: features include blind testsets, shared corpora, comparable metrics, adjudicated human evaluations, and post-hoc stability analyses of performance metrics. The result is a benchmark that has gained community-wide acceptance⎯the event typically draws several dozens of teams from around the world every year. The TREC QA tracks have, in fact, become a locus of question answering research, serving not only as an annual forum for meaningful comparison of natural language processing and information retrieval techniques, but also as an efficient vehicle for the dissemination of research results. The TREC paradigm has been duplicated in similar question answering evaluations around the world, most notably CLEF in Europe and NTCIR in Asia (both focusing on cross-language issues). </text>

<query_num> 4503 </query_num>
<text> For ad hoc test collections, which may contain hundreds of thousands of documents, exhaustively assessing the relevance of every document with respect to a particular topic is simply not practical. Instead, the pooling methodology is employed. In this setup, each team that participates in the evaluation contributes a certain number of documents to the pool (the pool depth) from its ranked list of results. After removing duplicates, all documents in the pool are manually assessed for relevance, and these judgments are used to score all results of all systems. Typically, each system contributes its top 100 hits (for each topic) to the pool, and is scored on all 1000 hits it returns. Zobel (1998) performed an analysis of the pooling strategy and confirmed that system rankings produced from relevance judgments gathered in this fashion are both trustworthy and fair; this means that TREC test collections can be soundly used for post-hoc evaluations, i.e., they are reusable. The performance of a new retrieval system that did not participate in the TREC evaluation, and hence did not have an opportunity to contribute to the pool, can still be accurately measured by the pooled judgments. Researchers have probed other aspects of ad hoc test collections, including the effect of topic size (Voorhees and Buckley 2002), the effect of incomplete judgments (Buckley and Voorhees, 2004), the effect of different evaluation metrics (Buckley and Voorhees, 2000), and different notions of relevance (Voorhees 2000; Voorhees 2001a; Sormunen 2002); in general, they have confirmed the reliability of existing test collections as a laboratory tool for experimentation and validated the general TREC methodology as an effective means for creating such test collections. </text>

<query_num> 4504 </query_num>
<text> Relevance judgments form an integral part of test collections because they define the set of documents that should be retrieved in response to a user information need. Although the notion of relevance has been much debated in the literature (e.g., Cooper, 1971; Saracevic, 1975; Harter, 1992; Barry and Schamber, 1998; Spink and Greisdorf, 2001; Mizzaro, 1999), Voorhees (2000) has shown that comparative evaluation of retrieval performance is invariant with respect to substantial difference in relevance judgments (cf. Harter, 1996). Nevertheless, it is worthwhile to explore the diverse and subtle factors that influence a person’s assessment of relevance and the notion of support in a question answering task. In the process of building our test collection, we have encountered a variety of issues regarding the human interpretation of answer correctness, which we share here. These experiences can be applied to qualitatively improve answers returned by future systems (cf. Sparck Jones, 2003). </text>

</top>
<top>
<paper_num> 46 </paper_num>
<paper_title> Exploring the limits of leakage power reduction in caches. </paper_title>
<doi> 10.1.1.91.7701 </doi>
<year> 2005 </year>
<paper_abstract> If current technology scaling trends hold, leakage power dissipation will soon become the dominant source of power consumption. Caches, due to the fact that they account for the largest fraction of on-chip transistors in most modern processors, are a primary candidate for attacking the leakage problem. While there has been a flurry of research in this area over the Last several years, a major question remains unanswered. What is the total potential of existing architectural and circuit techniques to address this important design concern? In this paper, we explore the limits in which existing circuit and architecture technologies may address this growing problem. We first formally propose a parameterized model that can determine the optimal leakage savings based on the perfect knowledge of the address trace. By carefully applying the sleep and drowsy modes, we find that the total leakage power from the L1 instruction cache, data cache, and a unified L2 cache may be reduced to mere 3.6%, 0.9%, and 2.3%, respectively, of the unoptimized case. We further study how such a model can be extended to obtain the optimal leakage power savings for different cache configurations. </paper_abstract>
<query_num> 4601 </query_num>
<text> While our paper attempts to address a previously unanswered question, there is a great deal of prior work aimed at reducing leakage power in caches. Azizi et al. [Azizi et al. 2003] introduced asymmetric dual-Vt SRAM cell caches(ACCs). ACCs exploit the fact that in ordinary programs most of the bits in caches are zeros for both the data and instruction streams, and provide significant leakage reduction in the zero state. DRI-cache [Powell et al. 2001] uses the Gated-Vdd technique to dynamically adjust the size of the active portion of the cache by turning off a bank of cache lines based on the miss rates. DRG-cache [Agarwal et al. 2002] employs Gated-Vdd to reduce leakage power by turning off the gated-Ground transistor, while data is restored when the gated-Ground transistor is turned on. DTSRAM [H.Kim and Roy 2002] uses body biasing to separately control the Vt of each cache line. To minimize the energy and delay overhead, a cache line is switched to high Vt when it is not likely to be used anymore. Kaxiras et al. [Hu et al. 2002; Kaxiras et al. 2001] proposed the cache line decay scheme to turn off the cache lines in the dead periods of their cache generations using the Gated-Vdd technique. Instead of placing both the tag and the data into the sleep mode, AMC [Zhou et al. 2003] keeps the tag alive and tracks the miss rate with respect to the ideal miss rate. This helps to dynamically adjust the turn-off interval and control the overall performance. Velusamy et al. [Velusamy et al. 2002] used formal feedbackcontrol theory to adaptively adjust the cache decay interval and cache lines are turned off accordingly. Another approach to reducing leakage power is called drowsy cache [Flautner et al. 2002; Kim et al. 2004; 2002], which decreases the supply voltage of idle cache lines. Specifically, all cache lines are periodically placed into drowsy mode. [Kim and Mudge 2004] studied techniques for data retention with lower supply voltage. [Hu et al. 2003] employed drowsy cache to exploit program hot-spots and code sequentiality for instruction cache leakage management. Parikh et al. [Li et al. 2004] compared Gated-Vdd and drowsy cache at different L2 latencies with HotLeakage and showed Gated-Vdd is superior for a set of faster L2 latencies. Heo [Heo et al. 2002] reduced bitline leakage by leaving bitlines open whose cache banks are not accessed. Hanson [Hanson et al. 2001] found that for L1 caches, MTCMOS, which is a state-preserving technique that operates multiple threshold voltages, outperforms Gated-Vdd. In [Li et al. 2003], the authors presented several architectural techniques that exploit the data duplication across the different levels of cache hierarchy. They found that the best strategy in terms of energy and energydelay product is to place the L2 subblock into a state-preserving mode as soon as its contents are moved to L1 and to reactive it only when it is accessed. Bai et al. [Bai et al. 2005] investigated the impact of Tox and Vth on power performance tradeoffs for on-chip caches. In contrast, [Sankaranarayanan and Skadron 2004; Zhang et al. 2002] studied software approaches. [Sankaranarayanan and Skadron 2004] decided the decay interval through profiling and showed that the optimal decay intervals can be estimated with a reasonable degree of accuracy using profiling. [Zhang et al. 2002] studied using compiler to insert power mode instructions that control the voltage for the cache lines to control leakage energy. </text>

<query_num> 4602 </query_num>
<text> The model for optimal leakage power savings serves two major functions. First, instead of being an abstract model, it is coded in C language and is publicly available for cache leakage studies 1 . To use the tool, designers only need to feed the following inputs into the C program, such as the transition energies obtained from CACTI [Shivakumar and Jouppi 2001], the leakage power consumption from HotLeakage [Zhang et al. 2003], the interval distribution from SimpleScalar [Desikan et al. 2001], and the duration parameters {s1, s2, s3, s4}. The tool will then find out an optimal mode transition sequence that can achieve the maximal leakage power savings and output the optimal leakage saving percentages of using the optimal sleep, the optimal drowsy, and the optimal combining methods. Second and the most important is that this model was designed to explore the optimal leakage savings under different architectural and design assumptions with the hope of guiding research effort on leakage power study. </text>

</top>
<top>
<paper_num> 47 </paper_num>
<paper_title> A probabilistic framework for relational clustering. </paper_title>
<doi> 10.1.1.92.2174 </doi>
<year> 2007 </year>
<paper_abstract> Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, we propose parametric hard and soft relational clustering algorithms under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unifies a number of stat-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, and semi-supervised clustering based on hidden Markov random fields. </paper_abstract>
<query_num> 4701 </query_num>
<text> Moreover, a number of important clustering problems, which have been of intensive interest in the literature, can be viewed as special cases of relational clustering. For example, graph clustering (partitioning) [7, 42, 13, 6, 20, 28] can be viewed as clustering on singly-type relational data consisting of only homogeneous relations (represented as a graph affinity matrix); co-clustering [12, 2] which arises in important applications such as document clustering and micro-array data clustering, can be formulated as clustering on bi-type relational data consisting of only heterogeneous relations. Recently, semi-supervised clustering [46, 4] has attracted significant attention, which is a special type of clustering using both labeled and unlabeled data. In section 5, we show that semi-supervised clustering can be formulated as clustering on singly-type relational data consisting of attributes and homogeneous relations. </text>

<query_num> 4702 </query_num>
<text> Clustering on a special case of relational data, bi-type relational data consisting of only heterogeneous relations, such as the word-document data, is called co-clustering or biclustering. Several previous efforts related to co-clustering are model based [22, 23]. Spectral graph partitioning has also been applied to bi-type relational data [11, 25]. These algorithms formulate the data matrix as a bipartite graph and seek to find the optimal normalized cut for the graph. Due to the nature of a bipartite graph, these algorithms have the restriction that the clusters from different types of objects must have one-to-one associations. Informationtheory based co-clustering has also attracted attention in the literature. [12] proposes a co-clustering algorithm to maximize the mutual information between the clustered random variables subject to the constraints on the number of row and column clusters. A more generalized co-clustering framework is presented by [2] wherein any Bregman divergence can be used in the objective function. Recently, coclustering has been addressed based on matrix factorization. [35] proposes an EM-like algorithm based on multiplicative updating rules. </text>

<query_num> 4703 </query_num>
<text> Graph clustering (partitioning) clusters homogeneous data objects based on pairwise similarities, which can be viewed as homogeneous relations. Graph partitioning has been studied for decades and a number of different approaches, such as spectral approaches [7, 42, 13] and multilevel approaches [6, 20, 28], have been proposed. Some efforts [17, 43, 21, 21, 1] based on stochastic block modeling also focus on homogeneous relations. </text>

<query_num> 4704 </query_num>
<text> Due to its simplicity, scalability, and broad applicability, k-means algorithm has become one of the most popular clustering algorithms. Hence, it is desirable to extend k-means to relational data. Some efforts [47, 2, 12, 33] in the literature work in this direction. However, these approaches apply to only some special and simple cases of relational data, such as bi-type heterogeneous relational data. As traditional k-means can be formulated as a hard version of Gaussian mixture model EM algorithm [29], we propose the hard version of MMRC algorithm as a general relational k-means algorithm (from now on, we call Algorithm 1 as soft EF-MMRC), which applies to various relational data. To derive the hard version MMRC algorithm, we omit soft membership parameters Λ (j) in the MMRC model (C (j) in the model provides the hard membership for each object). Next, we change the computation of the posterior probabilities in the E-step to reassignment procedure, i.e., in the E-step, based on the estimation of the current parameters, we re-assign cluster labels, {C (j) } m j=1, to maximize the objective function in (9). In particular, for each object, while fixing the cluster assignments of all other objects, we assign it to each cluster to find the optimal cluster assignment maximizing the objective function in (9), which is equivalent to minimizing the Bregman distances between the observations and the corresponding expectation parameters. After all objects are assigned, the re-assignment process is repeated until no object changes its cluster assignment between two successive iterations. </text>

<query_num> 4705 </query_num>
<text> The graphs based on the text data have been widely used to test graph partitioning algorithms [13, 11, 25]. In this study, we use various data sets from the 20-newsgroups [32], WebACE and TREC [27], which cover data sets of different sizes, different balances and different levels of difficulties. </text>

</top>
<top>
<paper_num> 48 </paper_num>
<paper_title> Secure Remote Authentication Using Biometric Data. </paper_title>
<doi> 10.1.1.93.6201 </doi>
<year> 2005 </year>
<paper_abstract> Abstract. Biometric data offer a potential source of high-entropy, secret information that can be used in cryptographic protocols provided two issues are addressed: (1) biometric data are not uniformly distributed; and (2) they are not exactly reproducible. Recent work, most notably that of Dodis, Reyzin, and Smith, has shown how these obstacles may be overcome by allowing some auxiliary public information to be reliably sent from a server to the human user. Subsequent work of Boyen has shown how to extend these techniques, in the random oracle model, to enable unidirectional authentication from the user to the server without the assumption of a reliable communication channel. We show two efficient techniques enabling the use of biometric data to achieve mutual authentication or authenticated key exchange over a completely insecure (i.e., adversarially controlled) channel. In addition to achieving stronger security guarantees than the work of Boyen, we improve upon his solution in a number of other respects: we tolerate a broader class of errors and, in one case, improve upon the parameters of his solution and give a proof of security in the standard model. 1 Using Biometric Data for Secure Authentication Biometric data, as a potential source of high-entropy, secret information, have been suggested as a way to enable strong, cryptographically-secure authentication of human users without requiring them to remember or store traditional cryptographic keys. Before such data can be used in existing cryptographic protocols, however, two issues must be addressed: first, biometric data are not uniformly distributed and hence do not offer provable security guarantees if used </paper_abstract>
<query_num> 4801 </query_num>
<text> Biometric data, as a potential source of high-entropy, secret information, have been suggested as a way to enable strong, cryptographically-secure authentication of human users without requiring them to remember or store traditional cryptographic keys. Before such data can be used in existing cryptographic protocols, however, two issues must be addressed: first, biometric data are not uniformly distributed and hence do not offer provable security guarantees if used as is, say, as a key for a pseudorandom function. While the problem of nonuniformity can be addressed using a hash function, viewed either as a random oracle [2] or a strong extractor [20], a second and more difficult problem is that biometric data are not exactly reproducible, as two biometric scans of the same feature are rarely identical. Thus, traditional protocols will not even guarantee correctness when the parties use a shared secret derived from biometric data. Much work has focused on addressing these problems in efforts to develop secure techniques for biometric authentication [8, 15, 19, 14, 22, 21]. Most recently, Dodis, Reyzin, and Smith [9] showed how to use biometric data to securely derive cryptographic keys which could then be used, in particular, for the purposes of authentication. Roughly speaking (see Section 2 for formal definitions), they introduce two primitives: a secure sketch which allows recovery of a shared secret given a close approximation thereof, and a fuzzy extractor which extracts a uniformly distributed string s from this shared secret in an error-tolerant manner. </text>

<query_num> 4802 </query_num>
<text> Although the above intuition is appealing, we remark that a number of subtleties arise when trying to apply this idea to obtain a provably secure solution. In particular, we will require the PAK protocol to satisfy a slightly stronger definition of security than that usually considered for PAK (cf. [1, 6, 12]); informally, the PAK protocol should remain “secure” even when: (1) the adversary can dynamically add clients to the system, with (unique) identities chosen by the adversary; (2) the adversary can specify non-uniform and dependent password distributions for these clients; and (3) the adversary can specify such distributions adaptively at the time the client is added to the system. Luckily, it is not difficult to verify that at least some existing protocols (e.g., [1, 17, 18, 11, 16]) satisfy a definition of this sort. 5 (Interestingly, the recent definition of [7] seems to imply the above properties.) Due to lack of space, the formal definition of security required for our application is deferred to the full version. </text>

</top>
<top>
<paper_num> 49 </paper_num>
<paper_title> Universal Approximation Capability of Cascade Correlation for Structures. </paper_title>
<doi> 10.1.1.96.9476 </doi>
<year> 2005 </year>
<paper_abstract> Cascade correlation (CC) constitutes a training method for neural networks which determines the weights as well as the neural architec-ture during training. Various extensions of CC to structured data have been proposed: recurrent cascade correlation (RCC) for sequences, re-cursive cascade correlation (RecCC) for tree structures with limited ¤ This work has been partially supported by MIUR grant 2002093941 004. We would like to thank two anonymous referees for profound and valuable comments on an earlier version of the manuscript. </paper_abstract>
<query_num> 4901 </query_num>
<text> Pattern recognition methods such as neural networks or support vector machines constitute powerful machine learning tools in a widespread area of applications. Usually, they process data in the form of real vectors of fixed and finite dimensionality. Canonical extensions of these basic models to sequential data which occur in language processing, bioinformatics, or time-series prediction, for example, are provided by recurrent networks or statistical counterparts (Bengio and Frasconi, 1996; Kremer, 2001; Sun, 2001). Recently, an increasing interest in the possibility to deal also with more complex non-standard data has been observed and several models have been proposed within this topic: specifically designed kernels such as string and tree kernels or kernels derived from statistical models constitute a canonical interface for kernel methods for structures (Haussler, 1999; Jaakkola, Diekhans, and Haussler, 2000; Leslie, Eskin, and Noble, 2002; Lodhi et al., 2000; Watkins, 1999). </text>

<query_num> 4902 </query_num>
<text> As already mentioned, RCC networks have been generalized to recursive cascade correlation (RecCC) networks for tree structured inputs (Bianucci et al., 2000; Sperduti, Majidi, and Starita, 1996; Micheli, 2003). Recursive networks constitute the analogous extension of recurrent networks to tree structures (Frasconi, Gori, and Sperduti, 1998; Goller and Küchler, 1996; Sperduti and Starita, 1997). For the latter, the universal approximation capability has been established in (Hammer, 2000). </text>

<query_num> 4903 </query_num>
<text> The context integrated in CRecCC neurons is of crucial importance. The available context of ¡ vertex at neuron can, for example, be measured in terms of the vertices of the DPAG which contribute directly or indirectly to¡¥¢�¢¡�. This context is precisely computed in (Micheli, Sona, and Sperduti, 2003a). As demonstrated in (Micheli, Sona, and Sperduti, 2003b; Micheli, 2003), this contextual information can in principle be used to compute certain transductions on graphs which require global knowledge of the structure and hence cannot be computed with simple recursive cascade correlation networks. Hence the functions which can be computed by RecCC networks on DPAGs form a proper subset of the functions which can be computed by CRecCC networks, e.g. the two structures depicted in Fig. 2 cannot be differentiated by a RecCC network, but they can be differentiated by a CRecCC network. RecCC networks constitute universal approximators for tree structures. </text>

</top>
<top>
<paper_num> 50 </paper_num>
<paper_title> Complexity results for security protocols with Diffie-Hellman exponentiation and commuting public key encryption. </paper_title>
<doi> 10.1.1.99.6124 </doi>
<year> 2008 </year>
<paper_abstract> We show that the insecurity problem for protocols with modular exponentiation and arbitrary products allowed in exponents is NP-complete. This result is based on a protocol and intruder model which is powerful enough to uncover known attacks on the Authenticated Group Diffie-Hellman (A-GDH.2) protocol suite. To prove our results, we develop a general framework in which the Dolev-Yao intruder is extended by generic intruder rules. This framework is also applied to obtain complexity results for protocols with commuting public key encryption. </paper_abstract>
<query_num> 5001 </query_num>
<text> Designing secure communication systems in open environments, such as the Internet, is a challenging task which heavily relies on cryptographic protocols. However, severe attacks can be conducted on these systems just by exploiting the inherent weaknesses of cryptographic protocols. Attacks on cryptographic protocols are easily overlooked at the design level as adversaries may control the communication network and may combine messages from different protocol sessions. Also, protocol This paper is a full version of work previously published in [Chevalier et al. 2003a] and [Chevalier et al. 2004]. The authors have partially been supported by PROCOPE and IST AVISPA. The second author was also supported by the DFG. </text>

<query_num> 5002 </query_num>
<text> The proofs of the NP-completeness results work in two steps: First, we extend the (standard) Dolev-Yao intruder by generic rules, called oracle rules. We show that the insecurity problem is NP-complete for intruders extended by such rules. Second, we show that the insecurity problem for the intruder extended by the ability to apply Diffie-Hellman exponentiation and commuting public key encryption, respectively, is an instance of the general framework established in the first step. Related work. The first works to relax the perfect cryptographic assumption by taking algebraic properties of operators into account are [Chevalier et al. 2003a; Comon-Lundh and Shmatikov 2003]. In these works, the Dolev-Yao model was extended by the XOR operator and its algebraic properties. For Diffie-Hellman exponentiation and commuting public key encryption, as studied here, things become, however, much more involved due to the more complex algebraic properties. In particular, unlike the XOR operator, here we do not have the nilpotency property and therefore need to record the number of occurrences of elements in a product, which requires to manipulate equations over integers. </text>

<query_num> 5003 </query_num>
<text> The protocol and intruder model we describe here extends standard models for the (automatic) analysis of security protocols [Amadio et al. 2002; Rusinowitch and Turuani 2001; Millen and Shmatikov 2001] in two respects. First, messages can be built using the operator Exp(·, ·), which stands for exponentiation, and a product operator “·”. Second, in addition to the standard Dolev-Yao rewrite rules, the intruder is equipped with the mentioned oracle rules, which are later instantiated with rules to exponentiate terms. In what follows, we provide a formal definition of our model by defining terms, messages, protocols, the intruder, and attacks. </text>

<query_num> 5004 </query_num>
<text> Our intruder model follows the Dolev-Yao intruder [Dolev and Yao 1983]. That is, the intruder has complete control over the network and he can derive new messages from his initial knowledge and the messages received from honest principals during protocol runs. To derive a new message, the intruder can compose and decompose, encrypt and decrypt messages, in case he knows the key. What distinguishes the intruder we consider here from the standard Dolev-Yao intruder is that we equip the intruder with guess rules which provide him with additional capabilities for deriving messages. In Section 2.5, we consider guess rules that satisfies certain conditions. We will call these rules oracle rules. </text>

</top>
