Earlier IR and friends

Here's stuff we did earlier. If you came here directly, you might be looking for this year's list of IR and friends meetings, and a blurb about the series .

Speakers in 2015

(Jump to: 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006.)

Monday 16 March
Liyuan (Jo) Zhou (NICTA)
Machine learning and external information for securities markets
CSIRO seminar room, 4-5
Monday 30 March
Kyosuke Tanaka (ANU)
A mask tells us more than a face: The role of self-disclosure in network formation on an online community

Although the degree of self-disclosure is usually self-manageable and it is regarded as a hallmark of the web, the role of self-disclosure is little known, especially in the online social network context. Studies about social identity and behaviour have uncovered a significant impact of self-disclosure on online communication (e.g., the social identity model of deindividuation effects). In contrast, prior research on online social networks has surprisingly overlooked the role of self-disclosure in network formation. One of the main reasons is that the social network approach emphasises the structure of social relations. However, this paper argues that it is also important to consider non-graph-theoretic attributes because they might play a confounding role in network formation. The paper thus focuses on the degree of self-disclosure of individual actors, and reveals how they influence the probability of reciprocity in network formation. To examine the relationship, this study involves to collect data from LiveJournal, and analyses it by using an exponential random graph model method.

CSIRO seminar room, 4-5
Monday 13 April
Mark Carman (Monash University)
Investigating performance and scalability issues for rank learning with regression tree ensembles

Rank learning algorithms provide an automated and coherent method for combining a diverse set of signals regarding the relevance of a document to a query (and user context) into a single retrieval score. As such, they have become a crucial component of current Information Retrieval infrastructure. State-of-the-art techniques for rank learning discover non-linear combinations of features and are mostly based on ensembles of regression trees, using either bagged & randomised regressors (as in Random Forests) or boosted ensembles (as in Gradient-boosted methods). With an interest in both the performance and scalability of these algorithms, we investigate three different issues: (i) the relative importance of negative examples (irrelevant documents) when training the algorithm, (ii) the importance of the size of the subsample used to learn individual regression trees, and (iii) the importance of the objective function used within the algorithm to recursively partition the feature space.

CSIRO seminar room, 4-5
Monday 27 April
Tom Worthington (ANU Research School of Computer Science and ACS Virtual College)
Innovations in teaching innovation

Tom takes over teaching the Australian Computer Society's "New Technology Alignment" (NTA) postgraduate course from January 2015. This offered on-line directly by the ACS Virtual College and through Open Universities Australia. In late 2014 he attended a conference in Canada on computer science education and met with Canadian academics to discuss flexible learning and teaching innovation. Tom discusses how this might be done in Australia, by blending on-line formal courses with face-to-face competitions.

Tom has made slides and notes available.

CSIRO seminar room, 4-5
Monday 11 May
Banda Ramadan (Australian National University)
Unsupervised blocking key selection for real-time entity resolution

Real-time entity resolution (ER) is the process of matching query records in sub-second time with records in a database that represent the same real-world entity. Indexing is a major step in the ER process, aimed at reducing the search space by bringing similar records closer to each other using a blocking key criterion. Selecting these keys is crucial for the effectiveness and efficiency of the real-time ER process. Traditional indexing techniques require domain knowledge for optimal key selection. However, to make the ER process less dependent on human domain knowledge, automatic selection of optimal blocking keys is required. In this paper we propose an unsupervised learning technique that automatically selects optimal blocking keys for building indexes that can be used in real-time ER. We specifically learn multiple keys to be used with multi-pass sorted neighbourhood, one of the most efficient and widely used indexing techniques for ER. We evaluate the proposed approach using three real-world data sets, and compare it with an existing automatic blocking key selection technique. The results show that our approach learns optimal blocking/sorting keys that are suitable for realtime ER. The learnt keys significantly increase the efficiency of query matching while maintaining the quality of matching results.

The goal of Banda's research project is to develop new techniques for efficient real-time entity resolution for multiple large dynamic databases. The specific objectives are to develop indexing techniques that enable real-time matching of a stream of query records with several large databases.

CSIRO seminar room, 4-5
Monday 25 May
Omid Rezvani (ANU)
Community detection in large networks

Recent studies have shown that social networks exhibit interesting characteristics such as community structures, i.e. vertices can be clustered into communities that are densely connected together and loosely connected to other vertices. Vertices in a community sharing common attributes are important for many applications such as bioinformatics and personalized recommendation. Thus, an important problem is to find communities in a large-scale network.

We first discuss the impact of community detection problem and its diverse set of applications in computer science, and second, we explore the state-of-the-art techniques for identifying communities in a large-scale network. Finally, we explore the evaluation metrics for measuring the accuracy of community detection techniques and provide an empirical analysis of different methods.

CSIRO seminar room, 4-5
Monday 22 June
Peter Bailey (Microsoft Bing)
User variability and IR system evaluation

Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We explore two aspects of user variability with regard to evaluating the relative performance of IR systems, assessing effectiveness in the context of a subset of topics from three TREC collections, with the embodied information needs categorized against three levels of increasing task complexity.

First, we explore the impact of widely differing queries that searchers construct for the same information need description. By executing those queries, we demonstrate that query formulation is critical to query effectiveness. The results also show that the range of scores characterizing effectiveness for a single system arising from these queries is comparable or greater than the range of scores arising from variation among systems using only a single query per topic. Second, our experiments reveal that searchers display substantial individual variation in the numbers of documents and queries they anticipate needing to issue, and there are underlying significant differences in these numbers in line with increasing task complexity levels. Our conclusion is that test collection design would be improved by the use of multiple query variations per topic, and could be further improved by the use of metrics which are sensitive to the expected numbers of useful documents.

CSIRO seminar room, 4-5
Monday 6 July
Mona Golestan Far (NICTA)
On term selection techniques for patent prior art search

In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular relevance feedback system that extracts terms from the judged relevant documents far outperforms the baseline and performs twice as well on MAP as the best competitor in CLEF-IP 2010. We find a very clear term selection value threshold for use when choosing terms. We also noticed that most of the useful feedback terms are actually present in the original query and hypothesized that the baseline system could be substantially improved by removing negative query terms. We tried four simple automated approaches to identify negative terms for query reduction but we were unable to notably improve on the baseline performance with any of them. However, we show that a simple, minimal interactive relevance feedback approach where terms are selected from only the fi rst retrieved relevant document outperforms the best result from CLEF-IP 2010 suggesting the promise of interactive methods for term selection in patent prior art search.

CSIRO seminar room, 4-5
Monday 3 August
Paul Thomas (CSIRO)
Pooled evaluation over query variations: Users are as diverse as systems

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned by a range of systems. Conventionally, pooling uses system variations to provide diverse documents to be judged for a topic; different user queries are not considered. We explore the ramifications of user query variability on pooling, and demonstrate that conventional test collections do not cover this source of variation. The effect of user query variation on the size of the judging pool is just as strong as the effect of retrieval system variation. We conclude that user query variation should be incorporated early in test collection construction, and cannot be considered effectively post hoc.

This is joint work with Alistair Moffat, Falk Scholer, and Peter Bailey.

CSIRO seminar room, 4-5
Monday 17 August
Ying-Hsang Liu (Charles Sturt University)
Individual differences, user perceptions and eye gaze in biomedical search interfaces

User search behaviour studies suggest that individual differences, such as domain knowledge, search experience and cognitive styles, are important factors affecting people’s interactions with search systems. However, very few studies have investigated the effect of individual differences on eye gaze for the design of natural search user interface. In this seminar, I will present findings from an eye-tracking study of the biomedical domain experts’ interactions with novel search interfaces. Thirty-two participants searched eight complex questions on four different search interfaces, which are distinguished by whether the Medical Subject Headings (MeSH) terms are presented and how the displayed MeSH terms are generated. Our findings reveal that domain knowledge and search experience significantly affect the users’ areas of interest (AOI) of different interface elements. There is a significant interaction effect between search interfaces and cognitive styles. The implications for search user interface design will be discussed.

CSIRO seminar room, 4-5
Monday 14 September
Francis Crimmins (LucidWorks)
Mining events for recommendations

The EventMiner feature in Lucidworks Fusion can be used to mine event logs to power recommendations. We describe how the system uses graph navigation to generate diverse and high-quality recommendations.

CSIRO seminar room, 4-5
Monday 12 October
David Hawking (Microsoft)
Highlights of SIGIR 2015
CSIRO seminar room, 4-5
Monday 26 October
Gaya Jayasinghe (CSIRO)
Statistical comparisons of non-deterministic IR systems using two dimensional variance

Retrieval systems with non-deterministic output are widely used in information retrieval. Common examples include sampling, approximation algorithms, or interactive user input. The effectiveness of such systems differs not just for different topics, but also for different instances of the system. The inherent variance presents a dilemma - What is the best way to measure the effectiveness of a non-deterministic IR system? Existing approaches to IR evaluation do not consider this problem, or the potential impact on statistical significance. In this paper, we explore how such variance can affect system comparisons, and propose an evaluation framework and methodologies capable of doing this comparison. Using the context of distributed information retrieval as a case study for our investigation, we show that the approaches provide a consistent and reliable methodology to compare the effectiveness of a non-deterministic system with a deterministic or another non-deterministic system. In addition, we present a statistical best-practice that can be used to safely show how a non-deterministic IR system has equivalent effectiveness to another IR system, and how to avoid the common pitfall of misusing a lack of significance as a proof that two systems have equivalent effectiveness.

CSIRO seminar room, 4-5
Monday 9 November
Paul Thomas (CSIRO)
Measuring engagement with online forms

Online form-filling and transactions are extremely common, both for industry and government; and it is important to provide a satisfying user experience during these tasks if customers or citizens are to continue using online channels. However, reliable measures of experience in these cases are limited. Other areas of information interaction, e.g., online search, news, and shopping, are increasingly exploring and attempting to measure the concept of user engagement (UE). In this study, we ask whether UE is an appropriate outcome for the utilitarian activities of online form-filling and transactions with government websites.

We describe work in progress which examines the use of the User Engagement Scale (UES) in this setting, and which looks for behaviours which correlate with the UES. Early results suggest that, first, the UES can be adapted to such situations; and second, that readily observable user behaviours including time on site, mouse movements, and keypresses correlate with UES sub-scales and can, to some extent, predict users' responses.

This is joint work with Heather O'Brien and Tom Rowlands.

CSIRO seminar room, 4-5
Monday 23 November
Gabriela Ferraro (NICTA and ANU) and Lexing Xie (ANU and NICTA)
Teaching document computing

ANU's "document analysis" course covers information retrieval, machine learning, natural language processing, information extraction, and web techologies. It's been taught over the past five years by a changing group of researchers at NICTA and CSIRO, each taking a section of the course in their own specialty.

We'll discuss this course, how it's run, what we like (and dislike), what we've observed, and what we'd like to do in future to teach document analysis and natural language computing at the ANU.

CSIRO seminar room, 4-5
Monday 7 December
Haris Memic (ANU)
Introduction to Dynamic Social Network Analysis with Tnet

Large majority of statistical analyses of both offline and online social networks based their studies on methods for static networks, even though most of social networks are dynamic. Online social networks provide us frequently with a unique opportunity to be able to access exact data of human online (inter)actions, including the precise times of those actions.

As opposed to using modelling approaches for static or panel network data, often better ways of studying what happened in online social networks is to analyse the exact behaviour of users and the network growth as it happened, by using dynamic/longitudinal methods. This presentation gives a short introduction to some of the possibilities of dynamic network analysis. A very basic introduction to tnet R framework will be provided. Dynamic network modeling will be illustrated on a Twitter Advocacy network, and results of the models will be discussed.

CSIRO seminar room, 4-5
Friday 11 December
Douglas W Oard (University of Maryland)
Building Search Engines for the "Bottom Billion"

About three quarters of a billion people are functionally illiterate, meaning that they have no more than a very basic ability to read or write. Modern search engines are powerful tools for much of the world's population, but if we are to build search engines for illiterate and low-literacy users we will need to come at the problem differently. I'll begin by describing two lines of work on this problem in the broad area known as Information and Communication Technology for Development (variously, ICTD or ICT4D), one that seeks to leverage visual interfaces, numeracy, and limited literacy, and a second that seeks to leverage speech. I'll then focus the rest of the talk on the work that we have been doing on speech-to-speech retrieval. The key challenge that we have sought to address is that most illiterate and low-literacy users don't speak any language for which we have the sorts of highly engineered Large-Vocabulary Continuous Speech Recognition (LVCSR) systems on which much of the recent work on speech retrieval depends. A shared-task evaluation in MediaEval started to tackle that challenge in 2011 using a Spoken Term Detection (STD) evaluation. The results there were promising, showing that systems could often recognize single terms in continuous speech based on examples, without any foreknowledge of the language. In our work, we have sought to build on one of these MediaEval systems to apply this STD capability to perform ad hoc ranked retrieval (i.e., finding recorded content that is most likely to satisfy a user’s information need). I'll describe the "Query by Babbling" interaction paradigm that we have been exploring, in which we are exploring what would happen if instead of short queries and long result sets, as is appropriate for text, we had long queries and short result sets, perhaps a better approach for speech. I'll then describe a test collection we have built using spoken content from a voice forum site used by farmers in Gujarat, India (speaking in Gujarati), some ranked retrieval systems that we have evaluated using that collection, and the results that we have obtained. I'll finish up with a few thoughts on where the remaining hard spots are with this technology, and what I see as next steps to address those challenges. This is joint work with Jerome White (NYU Abu Dhabi), Nintendra Rajput (IBM India Research Lab) and Aren Jansen (at the time at the Johns Hopkins HLTCOE).

Douglas Oard is a Professor at the University of Maryland, College Park, with joint appointments in the College of Information Studies (Maryland's iSchool) and the University of Maryland Institute for Advanced Computer Studies (UMIACS). Dr. Oard earned his Ph.D. in Electrical Engineering from the University of Maryland. His research interests center around the use of emerging technologies to support information seeking by end users. Additional information is available at http://terpconnect.umd.edu/~oard/.

Computer science seminar room (N101), 11-12. Please note the different venue and time!

Speakers in 2014

(Jump to: 2015, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006.)

Monday 3 February
Paul Thomas (CSIRO)
What users do: The eyes have it

Search engine result pages—the ten blue links—are a staple of document retrieval services. The usual presumption is that users read these one-by-one from the top, making judgments about the usefulness of documents based on the snippets presented, accessing the underlying document when a snippet seems attractive, and then moving on to the next snippet. In this talk we re-examine this assumption, and present the results of a user experiment in which gaze-tracking is combined with click analysis. We conclude that in very general terms, users do indeed read from the top, but that at a detailed level there are complex behaviors evident, suggesting that a more sophisticated model of user interaction might be appropriate. In particular, we argue that users retain a number of snippets in an "active band" that shifts down the result page, and that reading and clicking activity tends to takes place within the band in a manner that is not strictly sequential.

CSIRO seminar room, 4-5
Monday 17 February
Tom Rowlands (CSIRO)
Information retrieval through textual annotations
CSIRO seminar room, 4-5
Monday 3 March
Ian Wood (ANU)
Regularised topic modelling

Bayesian topic modelling (LDA and its relatives) has become a popular tool for finding structure/themes in (possibly large) collections of texts. From a practical perspective, this unsupervised method can be troubled with finding (stronger) uninteresting patterns in the data, and missing (weaker) patterns that are of interest. I'm working on a method for providing a prior on topic contents, adapted from a method recently developed at NICTA for improving topic coherence. This method essentially allows one to specify a set of word lists that we wish to have increased probability of appearing together in topics. Example word lists are LIWC (Linguistic Enquiry with Word Count), a tool for analysing psychological aspects of texts, and words expected to be of particular relevance to a historical question (eg: when analysing historical newsprint). This is work in progress, and I may or may not have evidence of the effectiveness of the approach by the time of this talk!

CSIRO seminar room, 4-5
Monday 31 March
Robert Ackland (ANU)
Frames and fields on Twitter

This presentation outlines preliminary findings from a research project focusing on activism on Twitter. We characterise an online activist field as a social arena in which participants vie for the definition of the most urgent cause or risk issue, and we ask the question: to what extent is it conceptually and empirical valid to regard protest activity on Twitter, such as the Occupy Wall Street movement, as an online activist field? Network analysis is used to examine two core aspects of field theory: the behaviour of incumbents and new entrants in response to a new issue or frame, and the dynamics of field formation. The project extends our earlier research on environmental social movement organisations and online collective identity formation, and contributes to emerging research on activism in the era of the "networked individual". The presentation also aims to highlight the methodological challenges and opportunities of "big data" (in particular, social media data) in empirical social science research.

Small CSIRO seminar room (S201, note change of venue ), 4-5
Monday 14 April
Peter Christen (ANU)
Advanced record linkage methods and privacy aspects for population reconstruction

Recent times have seen an increased interest into techniques that allow the linking of records across databases. The main challenges of record linkage are (1) scalability to the increasingly large databases common today; (2) accurate and efficient classification of compared records into matches and non-matches in the presence of variations and errors in the data; and (3) privacy issues that occur when the linking of records is based on sensitive personal information about individuals. The first challenge has been addressed by the development of scalable indexing techniques, the second through advanced classification techniques that either employ machine learning or graph based methods, and the third challenge is investigated by research into privacy-preserving record linkage.

In this presentation, we describe these major challenges of record linkage in the specific context of population reconstruction, outline recent developments of advanced record linkage methods, and provide directions for future research.

Peter Christen is an Associate Professor in the Research School of Computer Science at the Australian National University in Canberra. His research interests are in data mining and data matching (entity resolution). He is especially interested in the development of scalable and real-time algorithms for data matching, and privacy and confidentiality aspects of data matching and data mining. He has published over 80 papers in these areas, including the book `Data Matching' (2012, Springer), and he is the principle developer of the Febrl (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.

CSIRO seminar room, 4-5
Monday 28 April
Suvash Sedhain (ANU and NICTA)
Neighborhood-based social collaborative filtering

In our previous work on "Social Affinity Filtering" (COSN-13), we showed that social interactions and activities are highly predictive of user preferences and can be incorporated into highly effective social collaborative filtering systems. However, that work relied on supervised learning methods that cannot be applied directly to positive only (implicit feedback) data, e.g, what books a user has purchased. In the case of implicit feedback data, neighborhood-based nearest neighbor methods have proven quite successful but have not been extended to incorporate social data. In this work, we investigate the social extension of neighborhood-based collaborative filtering along with various design choices and their impact on empirical performance.

This is joint work Scott Sanner, Lexing Xie, and an external company (that has also provided the data).

CSIRO seminar room, 4-5
Monday 26 May
David Lovell and Paul Thomas (CSIRO)
Compositional Data Analysis (CoDA) approaches to distance in information retrieval

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole—term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR.

In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering.

CSIRO seminar room, 4-5
Monday 23 June
Gabriela Ferraro (NICTA)
A study of query reformulation for patent prior art search with partial patent applications

Patents are used by entities to legally protect their inventions and represent a multi-billion dollar industry of licensing and litigation. In 2012, 276,788 patent applications were approved in the US alone—a number that has doubled in the past 15 years. While much of the literature inspired by the evaluation framework of the CLEF-IP competition has aimed to assist patent examiners in assessing prior art for complete patent applications, less of this work has focused on patent search with queries representing (partial) applications to help inventors to assess the patentability of their ideas prior to writing a full application. In this paper, we carry out an intensive study of query reformulation for patent prior art search with partial patent applications, with the objective of assessing not only the performance of standard query reformulation methods, but also the effectiveness of query reformulation methods that exploit patent-specific characteristics. We also propose new query reformulation methods that (a) exploit patent structure and (b) leverage techniques for diverse term selection in query reformulation. We demonstrate that our methods improve both general (MAP) and patent-specific (PRES) evaluation metrics for prior art search performance on standardized datasets of CLEF-IP, with respect to both general and specific query reformulation methods.

This is joint work with Mohamed Reda Bouadjenek and Scott Sanner.

CSIRO seminar room, 4-5
Friday 27 June
Ian Soboroff (NIST)
Fun hard problems in IR evaluation

Dr. Ian Soboroff is a computer scientist and manager of the Retrieval Group at the National Institute of Standards and Technology (NIST). The Retrieval Group organizes the Text REtrieval Conference (TREC), the Text Analysis Conference (TAC), and the TREC Video Retrieval Evaluation (TRECVID). These are all large, community-based research workshops that drive the state-of-the-art in information retrieval, video search, web search, text summarization and other areas of information access. He has co-authored many publications in information retrieval evaluation, test collection building, text filtering, collaborative filtering, and intelligent software agents. His current research interests include building test collections for social media environments and nontraditional retrieval tasks.

CSIRO seminar room, 3-4
Thursday 3 July
Grant Ingersoll (LucidWorks)
This ain't your father's search engine

In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.

Grant Ingersoll is the CTO and co-founder of LucidWorks as well as an active member of the Lucene community—a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project and a long standing member of the Apache Software Foundation. Grant's prior experience includes work at the Center for Natural Language Processing at Syracuse University in natural language processing and information retrieval. Grant earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse University. Grant is also the co-author of "Taming Text" from Manning Publications.

(This is a joint IR and friends/iHcc talk.)

Manning Clark Centre , theatre 4, 6-7pm
Friday 4 July
Milad Shokouhi (Microsoft Research)
Recipes for PhD

PhD is a bit like cooking. Most people follow similar steps but the outcome could be very different. Each of us have our own special recipes, and eventually we all make a unique PhD cookbook. In this talk, I'll share my recipes. Bon Appétit!

Milad Shokouhi is a Senior Applied Researcher working for Bing at Microsoft Research Cambridge. He is also an honorary lecturer in School of Computing Science at the University of Glasgow. Before joining Microsoft, he did his PhD on federated search at RMIT University in 2007. His other research interests include auto-completion, personalization, federated search and query reformulation. He has published more than 30 papers and has served on the program committee of most major information retrieval conferences and journals.

RSISE seminar room, 2-3. RSISE is at building 115, corner of North and Daley Roads.
Monday 21 July
David Hawking and Peter Bailey (Microsoft); Paul Thomas (CSIRO)
A report back from the Workshop on Emerging Information Retrieval Directions, and this year's SIGIR conference.
CSIRO seminar room, 4-5
Monday 4 August
David Hawking and Peter Bailey (Microsoft)
What makes a high-impact paper?
CSIRO seminar room, 4-5
Monday 18 August
Paul Thomas (CSIRO)
Using interaction data to explain browsing difficulty

A user's behaviour when browsing a web site contains clues to that user's experience. It's possible to record some of these behaviours automatically, and extract signals that indicate a user is having trouble finding information. Some of these—such as time taken and amount of scrolling up a page—strongly predict navigation difficulty and can be recorded with minimal or no changes to existing sites or browsers. These can help web authors understand where and why their sites are hard to navigate.

( A paper based on this is to appear in ACM Trans Web .)

CSIRO seminar room, 4-5
Monday 13 October
Trung Nguyen (NICTA and ANU)
Gaussian process factorization machines for context-aware recommendations

In this talk I will describe the Gaussian process factorization machines (GPFMs) for context-aware recommendations. The talk can naturally be divided into three parts. In the first part, I discuss a formulation of the collaborating filtering problem as a regression task (with unknown inputs). Then, I address this regression task using the powerful (nonlinear nonparametric) Gaussian process framework. Finally, I will present some experiments on real-world datasets with both of the explicit and implict feedback settings.

CSIRO seminar room, 4-5
Monday 27 October
Tom Rowlands (CSIRO)
Can we find barriers to government online service adoption from call centre interactions?

Government agencies generally wish to encourage their customers to use digital services in preference to ringing call centres or attending a shop front, as digital services are cheaper for the agency and more convenient for the customer. While this has been the case for some time, there are still barriers to adoption. We will discuss preliminary, in-progress research in identifying barriers to digital service adoption by observing relevant portions of customers' interactions with an government call centre.

CSIRO seminar room, 4-5
Monday 10 November
Simon Kravis (Aleka Consulting)
Why auto-classification is difficult

Despite the massive advances in web search performance over the last 10 years, the accessibility of electronic documents within organizations has not improved much, despite huge increases in available computing power, the widespread availability of search technology and the availability different types of electronic document repositories. Automated methods of classifying documents using an organisational taxonomy would improve document accessibility but there is at present no commercial product available which can economically perform this task to a useful degree for an arbitrary organization. Hopeful entrants to this field appear to end-up working only in niche markets such as law and medicine which can meet the high costs of configuration. This study reports on an attempt to bring auto-classification to a small, non-profit, lobbying organisation using a training set approach which has highlighted the many difficulties in the task. These include small, degenerate training sets, implicit knowledge and the 'bag of words' document model.

CSIRO seminar room, 4-5
Monday 24 November
Suvash Sedhain (ANU and NICTA)
Deep recommendation
CSIRO seminar room, 4-5
Monday 1 December
Diane Kelly (University of North Carolina, Chapel Hill)
Statistical power analysis for sample size estimation and understanding risks in experiments with users

One critical decision that researchers must make when designing experiments with users is how many participants to study. In our field, the determination of sample size is often based on heuristics and limited by practical constraints such as time and finances. As a result, many studies are underpowered and it is common to see researchers make statements like "With more participants significance might have been detected," but what does this mean? What does it mean for a study to be underpowered? How does this effect what we are able to discover about information search behavior, how we interpret study results and how we make choices about what to study next? How does one determine an appropriate sample size? What does it even mean for a sample size to be appropriate?

In this talk, I will discuss the use of statistical power analysis for sample size estimation in experiments. Statistical power analysis does not necessarily give researchers a magic number, but rather allows researchers to understand the risks of Type I and Type II errors given an expected effect size. In discussing this topic, the issues of effect size, Type I and Type II errors and experimental design, including choice of statistical procedures, will also be addressed. I hope this talk will function as a conversation starter about issues related to sample size in experimental interactive information retrieval.

Diane Kelly is an Associate Professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Her research and teaching interests are in interactive information search and retrieval, information search behavior, and research methods. Kelly was recently awarded the Association for Information Science and Technology (ASIST) Research Award. She is the recipient of the 2013 British Computer Society's IRSG Karen Spärck Jones Award , the 2009 ASIST/Thomson Reuters Outstanding Information Science Teacher Award and the 2007 SILS Outstanding Teacher of the Year Award . She is the current ACM SIGIR treasurer and served as conference program committee co-chair in 2013. She serves on the editorial boards of Information Processing & Management, Information Retrieval Journal and Foundations and Trends in IR. Kelly received a Ph.D., M.L.S. and a graduate certificate in cognitive science from Rutgers University and a B.A. from the University of Alabama.

RSISE seminar room, 4-5. RSISE is at building 115, corner of North and Daley Roads.
Monday 8 December
Eliza Murray (ANU)
Could order and ambition emerge from the fragmented climate governance complex?

Over the past two decades, climate change governance has become a complex web of institutions, with over 100 international forums and a vast array of national, local and non-government initiatives. Experts are worried that this fragmentation creates loopholes, inefficiencies and conflict, and some have called for centralised coordination through the UN. But could coordination instead emerge from the bottom up? From flocking birds to flowing traffic, complexity science has shown how order can emerge from the seemingly simple interactions between individuals in a system. This research adopts a systems perspective to analyse the dynamics of the climate governance complex across scales. Using hyperlink network analysis and qualitative methods, it measures the degree of fragmentation of the climate governance system and investigates which sectors and regions are most fragmented.

Eliza Murray is currently researching the global governance of climate change with the support of the Sir Roland Wilson Foundation. She was awarded the Garnaut Prize for Academic Excellence in 2012. Her previous roles include Director of Land Sector Policy at the (then) Australian Department of Climate Change.

CSIRO seminar room, 4-5

Speakers in 2013

(Jump to: 2015, 2014, 2012, 2011, 2010, 2009, 2008, 2007, 2006.)

Monday 18 February
Paul Thomas (CSIRO)
Users vs models: What observation tells us about effectiveness metrics

We examine the link between IR effectiveness metrics and models of user behaviour, then ask: what do users really do? Do our models and metrics capture this or are we off beam?

CSIRO seminar room, 4-5
Monday 4 March
Hanna Souminen (NICTA)
This is an alarm—or is it?

Supporting experts' situational awareness via IR on the levels of populations, individuals, documents, and snippets. Applications and their evaluation.

CSIRO seminar room, 4-5
Monday 18 March
Simon Kravis (Aleka Consulting)
Investigating an investigation
CSIRO seminar room, 4-5
Monday 15 April
Ehsan Abbasnejad (NICTA)
Learning community-based preferences via Dirichlet process mixtures of Gaussian processes

Bayesian approaches to preference learning using Gaussian Processes (GPs) are attractive due to their ability to explicitly model uncertainty in users' latent utility functions; unfortunately existing techniques have cubic time complexity in the number of users, which renders this approach intractable for collaborative preference learning over a large user base. Exploiting the observation that user populations often decompose into communities of shared preferences, we model user preferences as an infinite Dirichlet Process (DP) mixture of communities and learn (a) the expected number of preference communities represented in the data, (b) a GP-based preference model over items tailored to each community, and (c) the mixture weights representing each user's fraction of community membership. This results in a learning and inference process that scales linearly in the number of users rather than cubicly and additionally provides the ability to analyze individual community preferences and their associated members. We evaluate our approach on a variety of preference data sources including Amazon Mechanical Turk showing that our method is more scalable and as accurate as previous GP-based preference learning work.

Smaller CSIRO seminar room (note change of venue: this is right next door to our usual spot), 4-5
Monday 27 May
David Hawking (Funnelback)
Making the best of poor user queries

A common cause of user disappointment with search arises from difficulty in posing an effective query. Choosing a good query is particularly critical behind the firewall where SEO is virtually non-existent and where many useful ranking features are typically missing.

Over the decades, many techniques have been developed which can assist, either in guiding users to choose better queries or in improving the queries which are submitted: Query suggestion, query completion, query correction, query substitution, query expansion, query shortening, query segmentation, query translation and query blending.

In this talk I will review the state of the art in automatic query guidance and query improvement methods which are most useful in the context of enterprise search and demonstrate their operation.

CSIRO seminar room, 4-5
Monday 3 June
Sándor Darányi (Swedish School of Library and Information Science, University of Borås)
Narrative processing: challenges and implications

The computational modelling of narratives is becoming an application area of text processing and analytics in its own right. I will report on work in progress in two directions, trying to identify motifs in folk tales, and to recognise formulaic content elements in specific relations with one another in classical mythology. Interestingly, narratives show properties which make comparisons with bioinformatics and quantum interaction meaningful.

Sándor Darányi is Professor in the Swedish School of Library and Information Science at the University of Borås, Sweden. He holds an MSc in agriculture, an MA in library and information science, and a PhD in ethnography. His research interests go back to the interplay between language and ordering, and include machine learning, digital preservation, narrative generation, digital humanities, advanced forms of information representation, evolving semantics, and quantum-like systems. He worked in science diplomacy during the 6th RTD Framework Program of European Research and Technological Development, first as Science and Technology Attaché of Hungary to Finland (1997-2001), then as the National Contact Point for the social sciences and humanities (2001-2003), and initiated and/or contributed to several national and international RTD projects during the 7th Framework Program. Worth mentioning are SHAMAN ("Sustaining Heritage Access through Multivalent ArchiviNg", 2008-2011) and PERICLES ("Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics", 2013-2016).

N101 (downstairs seminar room), 1-2
Monday 24 June
Suvash Sedhain (NICTA)
Social Affinity Filtering

Content recommendation in social networks poses the complex problem of learning user preferences from a rich and complex set of interactions (e.g., likes, comments and tags for posts, photos and videos) and activities (e.g., favorites, group memberships, interests). While many social collaborative filtering approaches learn from aggregate statistics over this social information, we take a different approach of recommendation by analysing user's fine-grained interactions (e.g., users who have been tagged in the target user's video) and activities (e.g., users who have joined the same special interest group that the target user has joined). My research focues on leveraging the rich set of the information available in social networks for more personalised and effective recommendation. In this talk, I will be presenting my recent work in social collaborative filtering.

CSIRO seminar room, 4-5
Monday 8 July
Tom Worthington (TomW Communications)
MOOCs with Books

Synchronisation of large scale asynchronous e-learning

CSIRO seminar room, 4-5
Monday 5 August
Ian Wood (ANU)
Mining social norms in the pro-ana Twitter community

Since December last year, I have been collecting tweets on a collection of tags associated with the pro-ana (pro-anorexia) movement - over 400,000 tweets to date. The aim is to identify social norms in this data through their expression in group assimilation (new users entering the group and taking on the norms) and corrective behaviour (conflict associated with norm breaking). This remains work in progress, though some initial results are promising. As a bonus, time permitting, I may present ideas and initial work on "context sensitive LIWC"—an attempt to combine LIWC (Linguistic Inquiry with Word Count - a word frequency based psychometric tool) with LDA (Latent Dirichlet Allocation—topic modelling).

CSIRO seminar room, 4-5
Monday 19 August
Lexing Xie (ANU)
Scalable mobile video retrieval with sparse projection learning and pseudo label mining

Retrieving relevant videos from a large corpus on mobile devices is a vital challenge. We address two key issues for mobile search on user-generated videos. The first is the lack of good relevance measurement, due to the unconstrained nature of online videos, for learning semantic-rich representations. The second is due to the limited resource on mobile devices, stringent bandwidth, and delay requirement between the device and the video server. We propose a knowledge-embedded sparse projection learning approach. To alleviate the need for expensive annotation for hash learning, we investigate varying approaches for pseudo label mining, where explicit semantic analysis leverages Wikipedia and performs the best. In addition, we propose a novel sparse projection method to address the efficiency challenge. It learns a discriminative compact representation that drastically reduces transmission cost. With less than 10% non-zero element in the projection matrix, it also reduces computational and storage cost. The experimental results on 100K videos show that our proposed algorithm is competitive in the performance to the prior state-of-the-art hashing methods which are not applicable for mobiles and solely rely on costly manual annotations. The average query time on 100K videos consumes only 0.592 seconds.

This is joint work with Guan-Long Wu, Winston Hsu and others in National Taiwan University.

CSIRO seminar room, 4-5
Monday 26 August
Bradley Malin (Vanderbilt University)
Towards practical private data integration and analysis

"IR and friends" attendees may also be interested in this seminar: details at the CECS website .

CECS seminar room (N101), 4-5
Monday 2 September
Rob Ackland (ANU)
Some approaches for social scientific research using Twitter

In this presentation I provide preliminary findings from two Twitter research projects. The first involves the use of social movement theory and logistic regression to investigate the factors that predict whether a Twitter user will contribute to the emergence of a new hashtag, using a collection of Occupy Wall Street Twitter data as an example dataset. The second involves the use of index number theory (from economic decision theory) to develop new measures of attention and information consumption in social media.

CSIRO room S201 (next to the seminar room, note change of venue), 4-5
Monday 16 September
Ying-Hsang Liu (Charles Sturt University)
The effect of search task familiarity on search behaviours in biomedical search

Medical Subject Headings (MeSH) terms have been extensively used to organize information resources in the biomedical domain. Current search systems (e.g., PubMed and MEDLINE based on MeSH) use various retrieval techniques (e.g., suggested term mapping and query expansion) to map user queries to potentially relevant documents. But the usefulness of these retrieval techniques has rarely been evaluated in interactive search systems.

In this seminar the researchers will present a user study that was conducted to evaluate the effectiveness of a metadata based query suggestion interface for biomedical search, and investigated the impact of search task familiarity on search behaviours. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. The results show that when searching for an unfamiliar topic, users were more likely to change their queries. The proposed interface was relatively more effective when less familiar search requests were attempted. Implications for the evaluation of interactive search systems will be discussed.

Findings of this study have been recently published in: Tang, M.-C., Liu, Y.-H., & Wu, W.-C. (2013). A study of the influence of task familiarity on user behaviors and performance with a MeSH term suggestion interface for PubMed bibliographic search. International Journal of Medical Informatics, 82(9), 832-843.

CSIRO seminar room, 4-5
Monday 14 October
Peter Bailey (Microsoft)
Roger Clarke (Xamax Consulting Pty Ltd)
Kim Tiffen (Office of Research Integrity, ANU)
Ethical considerations in computer science research

Between us we do a lot of work with participants in labs, but also with data from log files, databases, social media, etc. A lot of that data is potentially senstitive, and by the time it reaches us it is being used in ways that the original authors/subjects might not expect.

We'll discuss the history and basic frameworks of human subjects research; practice at the ANU; and ethical considerations when dealing with people in the lab or with trace data collected online.

Please come along with your ethical riddles and practical questions.

(Material from the day: Roger's short paper .)

CSIRO seminar room, 4-5
Monday 28 October
Honglin Yu
Predicting YouTube video viewcount with Twitter feed

Our recent work proposes a novel method to use Twitter features to predict two difficult cases of content popularity on YouTube—the sudden jump in viewcount, and the viewcount of newly uploaded videos. User influence in Twitter and content popularity on YouTube are both very active areas of research, but little attention was devoted to measuring the effects of the former on the latter. We define two classification problems for viewcount jump and new video popularity, respectively. We extracted four types of features from Twitter, including information about tweets, Twitter user graph, and the interactions that users perform and receive. Prediction performances are reported on thousands of YouTube videos mentioned in a 3-month Twitter feed from 2009. The accuracy for predicting jump improves by 0.10 over a baseline of viewcount history; the accuracy for predicting early popularity improves by 0.25 over random baseline, where no history is available. These promising results will help a range of applications, including content recommendation on social media, advertising, and others.

CSIRO seminar room, 4-5
Monday 11 November
Simon Gog (University of Melbourne)
Succinct data structures: From theory to practice

Succinct Data Structure use space close to the compressed representation of the underlaying objects but provide operations in the same time complexity as their uncompressed counterparts. Since the early 90s, more and more succinct versions of data structures have been proposed—ranging from bitvectors to complex information retrieval (IR) systems. Despite their attractive theoretical properties, there are only rare examples of practical use in systems. One reason for this is that an efficient implementation of complex structures, like a compressed text index, requires not only a profound knowledge of data compression and structures but also of modern hardware. In this seminar, I will give a introduction to Succinct Data Structures and present the C++ template library SDSL, which represents the state of the art in the field. This library facilitates easy composition of compression and indexing tools, which can be used to operate on large data sets in areas as Bioinformatics, IR, and Natural Language Processing. One recent example of the adoption of the techniques in industry is the social graph of Facebook.

The library code is available at: https://github.com/simongog/sdsl-lite.

Bio: Simon Gog is a postdoctoral researcher in the Department of Computing and Information Systems at The University of Melbourne. He completed his PhD in 2011 at Ulm University, Germany, and has research interests in string processing, information retrieval, and algorithm engineering. His main focus is on implementing efficient compact or succinct structures, and on carrying out experimental investigations in a way that leads to reliable and reproducible results.

CSIRO seminar room, 4-5
Monday 25 November
Sanat Bista (CSIRO)
Social Trust Based Friend Recommender for Online Communities

Recommendations to connect like-minded people can result in increased engagement amongst members of online communities, thus playing an important role in their sustainability. We have developed a suite of algorithms for friend recommendations using a social trust model called STrust. In STrust, the social trust of individual members is derived from their behaviours in the community. The unique features of our friend recommendation algorithms are that they capture different behaviours by (a) distinguishing between passive and active behaviours, (b) classifying behaviours as contributing to users' popularity or engagement and (c) considering different member activities in a variety of contexts. We present our social trust based recommendation algorithms and evaluate them against algorithms based on the social graph (such as Friends-Of-A-Friend). We use data collected from the online CSIRO Total Wellbeing Diet portal which has been trialled by over 5,000 Australians over a 12 week period. Our results show that social trust based recommendation algorithms outperform social graph based algorithms.

CSIRO seminar room, 4-5
Monday 9 December
Liyuan Zhou (ANU)
Investigating indexing units for Chinese web information retrieval: Chinese word segmentation versus n-grams
PengFei (Vincent) Li (ANU)
Merging algorithms for meta-search
CSIRO seminar room, 4-5

Speakers in 2012

(Jump to: 2015, 2014, 2013, 2011, 2010, 2009, 2008, 2007, 2006.)

Monday 20 February
Paul Thomas (CSIRO)
Match report from SWIRL
I'll quickly point out some highlights of SWIRL, a workshop in Lorne to discuss future directions in IR research.
CSIRO seminar room, 4-5
Monday 5 March
Scott Sanner (NICTA)
New objectives for social collaborative filtering
This paper examines the problem of social collaborative filtering (CF) to recommend items of interest to users in a social network setting. Unlike standard CF algorithms using relatively simple user and item features, recommendation in social networks poses the more complex problem of learning user preferences from a rich and complex set of user profile and interaction information. Many existing social CF methods have extended traditional CF matrix factorization, but have overlooked important aspects germane to the social setting. We propose a unified framework for social CF matrix factorization by introducing novel objective functions for training. Our new objective functions have three key features that address main drawbacks of existing approaches: (a) we fully exploit feature-based user similarity, (b) we permit direct learning of user-to-user information diffusion, and (c) we leverage co-preference (dis)agreement between two users to learn restricted areas of common interest. We demonstrate that optimizing the new objectives significantly outperforms a variety of CF and social CF baselines on live user trials in a custom-developed Facebook App involving data collected over two months from over 100 App users and their 34,000+ friends.
This work will appear at WWW 2012 and is co-authored with Joseph Noel, Khoi-Nguyen Tran, Peter Christen, Lexing Xie, Edwin Bonilla, Ehsan Abbasnejad, and Nicolas Della Penna.

CSIRO seminar room, 4-5
Monday 19 March
Elly Liang (ANU)
User profiling based on folksonomy information in Web 2.0 for personalized recommender systems
This thesis proposed novel approaches to use the emerging user information in Web 2.0 to help users solve the information overload issue. The user created content description and classification information-Folksonomy was used to find users' interests. Based on users' interest profiles, personalized recommendations can be generated for each user. This thesis contributes to effectively use the wisdom of crowds to provide more accurate user profiling and recommendation approaches.
The thesis also can be downloaded from the following link: http://eprints.qut.edu.au/41879/

CSIRO seminar room, 4-5
Monday 2 April
Paul Rivera and William Han (NICTA)
Introducing OpinionWatch
OpinionWatch is a tool developed over the last 3 years at NICTA for visually and interactively exploring large document sets. The tool integrates topic modeling, named entity recognition, keyphrase extraction, and sentiment analysis into a unified interface where mouseover's and mouseclicks allow one to efficiently browse through visual data summaries and drill down to relevant source content.
A text description probably does not do it justice... best to see it in action.

CSIRO seminar room, 4-5
Monday 30 April
Rob Ackland (ANU)
Revealed preference in clickstream networks
Revealed preference theory is used to test whether a given website clickstream network (showing the flow of attention or "eyeballs" between websites) could have been generated by a utility maximizing agent, that is, whether web users share common preferences over the consumption of website content. The revealed preference test of common preferences involves consumption quantities (clickstreams) and associated prices, which we derive from the hyperlink network of the websites. The hyperlink network is constructed by webmasters who link to sites that they believe will be of interest or useful to web users, and we interpret the geodesic distance from site i to site j as a measure of the price that web user i "pays" to consume content from site j. So we assume, for example, that while Australian web user i knows about Chinese website j, there is a high price of consuming this content (because of the need to use translating software, for example) compared with consuming content from a website closer in the hyperlink network. An application using clickstream and hyperlink data for around 1000 websites is provided.
CSIRO seminar room, 4-5
Monday 14 May
Chengjun Wang (City University of Hong Kong)
Jumping over network threshold: News diffusion on news sharing website
The rise of social media, especially the news sharing website (NSW), revives the classic studies of news diffusion, and nowadays information diffusion has been extensively explored. However, there is a puzzle of limited diffusion range (Lerman et.al 2011, Bakshy et al. 2011, Leskovec et al. 2006, Sun et. al, 2009). This talk aims at gauging how widespread could news diffuse on NSW, and what're the determinants, by introducing the concept of news sharing website (NSW) and briefly reviewing the related theories of diffusion. This study draws on the measure of threshold, and attempts to distinguish how news aggregating function of NSW, social influence, and homophily will influence the news diffusion on both Digg and Sina Weibo. The results reveals that: first, diffusion range on Digg is a log-normal distribution, while it follows power-law distributon on Sina Weibo; second, news jumps in the social network of Digg, and non-interpersonal effect plays an important role in information spreading, while news infects individuals continuously on Sina Weibo (so far it's only a conjecture). The use of epidemic models, sandpile models, and hack's law will be briefly discussed in term of information flow system.
CSIRO seminar room, 4-5
Monday 25 June
Ying-Hsang Liu (Charles Sturt University)
Controlled vocabularies and search
CSIRO seminar room, 4-5
Monday 9 July
Paul Thomas (CSIRO)
Identifying disagreement online
Online debate forums provide a powerful communication platform for individual users to share information, exchange ideas and express opinions on a variety of topics. However, it is still challenging to understand people's opinions because of informal language use and the dynamic nature of online conversations.
In this talk, we propose a new method for identifying participants' agreement or disagreement on an issue by exploiting information contained in online posts.

CSIRO seminar room, 4-5
Monday 23 July
Tom Worthington (ANU)
Green computing professional education course online
Just back from the 7th International Conference on Computer Science & Education (ICCSE 2012), Tom will reprise his presentation on how to teach on-line using e-documents. He will also give some impressions as to where Australian tertiary computer science and engineering education is heading in the face of global competition.
( Notes from Tom's conference presentation are online.)

CSIRO seminar room, 4-5
Monday 6 August
Giorgio Maria Di Nunzio (University of Padua)
Visualization of probabilistic models
Data mining applications can retrieve and explore existing information as well as extrapolate, predict, and derive new information from the given database. Classification is a special kind of the prediction task which deals with the need of classifying items based on previously classified training data. The research in this field has been very active in the last years. Some of the techniques presented in literature require that the user selects the dataset and sets the values for some parameters of the algorithm - which are often difficult to determine a priori. Moreover, some of these act as black boxes, thus screening the user out of the analysis process. In this context, interpreting learned parameters and discovering the causal process underlying observed data become difficult tasks. Data mining applications may benefit significantly by providing visual feedback and summarization. Visual data mining is a general approach which aims to include the human in the data exploration process, thus gaining benefit from his perceptual abilities. In particular, users often want to validate and explore the classifier model and its output or understand the classification rationale. To address these issues, the classification system should have an intuitive and interactive explanation capability.
I'll present a visualization tools for Bayesian classifiers that can help the user understand how far is the classifier from the correct decision given a vector of parameters in input. The user can interact with the classifier by: (i) selecting different models, (ii) changing prior's parameters and (iii) tuning the mis-classification costs.

CSIRO seminar room, 4-5
Wednesday 5 September
Ricardo Baeza-Yates (Yahoo! Research)
The web: Wisdom of crowds and a long tail
The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as more than one billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. But how is the Web? Web data mining is the main task to answer this question. Web data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. One important phenomenon of this wisdom is the long tail of the special interests of people. In this talk we cover all these concepts and give specific examples.
Ricardo Baeza-Yates is VP of Yahoo! Research for Europe, Middle East and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, as well as supervising the newer lab in Haifa, Israel. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School of the University of Chile; and ICREA Professor at the Department of Technology of the University Pompeu Fabra in Barcelona, Spain. He is co-author of the best-seller book Modern Information Retrieval, published in 1999 by Addison-Wesley with a second edition in 2011, as well as co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures , Prentice-Hall, 1992, among more than 200 other publications. He has received the Organization of American States award for young researchers in exact sciences (1993) and several national awards in Chile. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences. During 2007 he was awarded the Graham Medal for innovation in computing, given by the University of Waterloo to distinguished ex-alumni. In 2009 he was awarded the Latin American distinction for contributions to CS in the region and became an ACM Fellow, followed in 2011 by an IEEE Fellowship.
Seminar room (N101), 2-3
Monday 17 September
Simon Kravis
A study of document use on file servers
CSIRO seminar room, 4-5
Monday 15 October
David Hawking (Funnelback)
Gentlemen Prefer Blends
Modern search engines implement a variety of techniques for generating variants of the query actually submitted. These include spelling suggestion, synonyms, queries related by co-clicking, queries at the end of within-session refinement sequences, acronym expansion/generation, compounding/decompounding, conflation of British/American spellings, and possibly translation. Variants may be suggested to the user or they may be run along with the original query behind the scenes, influencing the result set and it's ranking. I would like to talk informally about the problem of query blending and the questions it raises: Which techniques to use for generating variants? How many and which candidate variants to actually run? How to blend scores from variants and the original query? How to evaluate performance of blending?
( Dave's bio )
CSIRO seminar room, 4-5
Monday 12 November
Elly Liang (ANU)
Time-aware Topic Recommendation Based on Micro-blogs
Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com.
CSIRO seminar room, 4-5
Monday 26 November
Kar Wai Lim (NICTA)
Relevance vs diversity
CSIRO seminar room, 4-5

Speakers in 2011

(Jump to: 2015, 2014, 2013, 2012, 2010, 2009, 2008, 2007, 2006.)

Monday 7 February
Jan-Felix Schmakeit (UniSA)
From lemons and broomsticks: Eye-tracking for the evaluation of search engines
Jan-Felix will tell us a little about how he's spending his summer: working on eye tracking of a few different search engines. CSIRO seminar room, 4-5
Monday 21 February
Tim Jones (ANU)
Intentionally degrading search engine results
Tim will be telling us about some recent work where he deliberately made search results worse. (Photo available)
CSIRO seminar room, 4-5
Monday 7 March
Paul Thomas (CSIRO)
Recommending pages we haven't even seen
I'll be talking about some ongoing work building a system which can recommend web pages it doesn't actually know about.
CSIRO seminar room, 4-5
Monday 21 March
Tim Jones (ANU)
User reactions to spam
Tim has volunteered to tell us about some of his recent cogitations on user reactions to spam
CSIRO seminar room, 4-5
Monday 4 April
Simon Kravis (Fujitsu)
Filesystems: How they are constructed, managed and used in large organisations
Despite the increasing use of document management systems within large organisations, filesystems are still very widely used for storing files from desktop applications. Volume and file count and use profiles reveal some unexepected characteristics which offer the potential for improved management.
CSIRO seminar room, 4-5
Monday 2 May
Richard Jones (Lloyd Jones Consulting)
STATUS: IR stories from the Dark Ages
[I'll] probably talk about the file structures, portability, applications, relevance ranking
CSIRO seminar room, 4-5
Tuesday 3 May
Australian Demographic & Social Research Institute (ADSRI) seminars
Lingfei Wu (City University of Hong Kong)
How the Web1.0 fails: The mismatch between hyperlinks and user-flow
Hai Liang (City University of Hong Kong)
The structure of public expression and issue rise: Participation heterogeneity, concentration, and timing in internet forums
Seminar Room A, Coombs Building, 3:30-5
Monday 16 May
Joseph Noel (NICTA): Social recommendation and collaborative filtering
Oulin Yang (NICTA): Time and event extraction from news
CSIRO seminar room, 4-5
Thursday 26 May
Computer Science seminar
Ying-Hsang Liu (Charles Sturt University): Query reformulation, individual difference and search performance over query sessions
Query reformulation is the user's articulation of information needs after the initial search. In this talk I will report on findings from a larger study that was designed to assess the effectiveness of MeSH (Medical Subject Headings) terms when used by different types of searchers in an interactive search environment. I will examine the characteristics of queries formulated by different types of searchers, exemplified by different levels of domain knowledge and search training. Using several effectiveness measures, such as MAP and nsDCG (normalised session DCG), I will discuss the relationship between individual difference and search performance over query sessions.
ANU seminar room (N101), 4-5
Monday 30 May
Hai Liang (City University of Hong Kong): Public expression in web forums: Structure characteristics, issue rise, and opinion climate
The implications of unrestricted web forum discussions for civil society and political deliberation brought much debate in the past decade. This study investigates the structure and opinion climate of public expression in web forums. I analyzed 110 threads including more than 3,000 posts that were collected from a famous Chinese Internet forum. Although, the inequality of participation was considered as obstacle for strong and healthy civil society, this study found that it works in favor of the rise of public issue. However, there is no direct relationship between the salience of public issue and supportive opinion climate in the threads. The climate of opinion appears dependent on the distribution of the opinions expressed by the early repliers in the threads.
(papers and slides)
Lingfei Wu (City University of Hong Kong): How Web1.0 fails: The mismatch between hyperlinks and clickstreams
The mismatches between "top-down" systems and "bottom-up" patterns exist widely in human society. People bothered by the problem include Walter Burley Griffin, M. S. Gorbachev, and Tim Berners-Lee. In this presentation, I shall show the mismatch between a "top-down" system (a WWW hyperlink network comprised of the top 1000 websites in the world) and "bottom-up" patterns (clickstreams generated by users) in the virtual world. Moreover, I shall discuss how "survival of the fittest" in the virtual world leads to the evolution of the WWW from Web 1.0 to Web 2.0.
CSIRO seminar room, 4-5
Thursday 9 June
iHcc: information and Human-centred computing seminar
Cécile Paris (CSIRO)
Helping readers browse through linked documents
With the continuous growth of information and its high connectivity, it is hard to browse through large interconnected information spaces to learn about a topic, knowing when to follow what links and not to get lost in hyperspace. Our aim is to support people who read documents in a highly connected information space, helping them remain on focus. Our contextually-aware in-browser text summarisation tool does this by capturing users' current interests and providing users with contextualised summaries of linked documents, to help them decide whether the link is worth following. In this talk, I will present two prototype systems that illustrate this concept: IBES supports a user reading Wikipedia articles, and CSIBS supports researchers in their browsing scientific material.
ANU seminar room (N101), 11-12
Monday 27 June
Lexing Xie (ANU)
Multimedia retrieval: visual concept-based query expansion and re-ranking
Semantic concept-based query expansion and re-ranking is an important sub-problem in multimedia retrieval. In particular, we explore the utility of a fixed lexicon of visual semantic concepts for automatic multimedia retrieval and re-ranking purposes. In this paper, we propose several new approaches for query expansion, in which textual keywords, visual examples, or initial retrieval results are analyzed to identify the most relevant visual concepts for the given query. These concepts are then used to generate additional query results and/or to re-rank an existing set of results. We develop both lexical and statistical approaches for text query expansion, as well as content-based approaches for visual query expansion. In addition, we study several other recently proposed methods for concept-based query expansion. In total, we compare 7 different approaches for expanding queries with visual semantic concepts. They are evaluated using a large video corpus and 39 concept detectors from the TRECVID-2006 video retrieval benchmark. We observe consistent improvement over the baselines for all 7 approaches, leading to an overall performance gain of 77% relative to a text retrieval baseline, and a 31% improvement relative to a state-of-the-art multimodal retrieval baseline.
CSIRO seminar room, 4-5
Monday 11 July
Peter Christen (ANU) (work done with Dinusha Vatsalan and Vassilios Verykios)
Scalable privacy-preserving record linkage using similarity-based indexing
Privacy-preserving record linkage are techniques that allow the scalable, automatic and accurate matching of databases across organisations such that no sensitive or confidential information needs to be revealed by the database holders, and the parties involved learn about the matched records.
In this presentation I will provide some background on this topic, illustrate the challenges involved in privacy-preserving record linkage, and present a novel scalable protocol for privacy-preserving record linkage which we have developed in the past few months.

CSIRO seminar room, 4-5
Monday 8 August
Alexander Krumpholz (CSIRO and ANU)
Medical literature retrieval
CSIRO seminar room, 4-5
Monday 22 August
Victoria Redfern (Redfern)
Introducing Rootza, a search term generator
CSIRO seminar room, 4-5
Monday 5 September
Paul Thomas (CSIRO)
Match report from SIGIR 2011
I'll quickly point out some highlights of this year's SIGIR conference
CSIRO seminar room, 4-5
Monday 19 September
Dave Hawking (Funnelback)
How modern hardware facilitates large-scale indexing and searching
(e.g. ClueWeb09 cat B and Tweets96)
Expertise finding systems in practice
With reference to Australia's Knowledge Gateway and (?) CSIRO PeopleFinder
CSIRO seminar room, 4-5
Monday 17 October
Simon Kravis (Fujitsu)
Information retrieval of authored documents within a large software development project
(Paul will be away in October. Please contact Tom Gedeon if you have any questions about IR and friends on the 17th or 31st.)
CSIRO seminar room, 4-5
Monday 14 November
Dave Hawking (Funnelback)
Match report from CIKM 2011
Some highlights of the CIKM 2011 conference
CSIRO seminar room, 4-5
Monday 28 November
Research School of Computer Science seminar
Chris Clifton (Purdue University)
Freeing cloud databases from privacy constraints
Privacy regulations can constrain how data is managed, particularly trans-border sharing and storage of data. This has significant implications for the use of cloud databases to manage private data. While management of encrypted data has received some research attention, this limits the services that can be provided by a cloud database. We propose to instead encrypt only the link between identifying data and sensitive information, thus eliminating the "individually identifiable" aspect of data that triggers most privacy regulations. This frees the cloud database to provide value-added services such as data cleansing and data analysis without constraint of privacy regulations.
This talk will discuss our early results in this area, including schema development (how do we ensure that sensitive information cannot be identified?) and query processing (how do we handle the fact that part of the data needed to process the query is encrypted, and only the client has the key?) In addition to our existing results, we will discuss ongoing work, including the challenges posed by database systems designed specifically for cloud computing.

RSCS seminar room (N101), 4-5
Thursday 1 December and Friday 2 December
Australasian Language Technology Workshop
At the ANU, all day
Friday 2 December
Australasian Document Computing Symposium
At the ANU, all day
Monday 12 December
Sci-fi Christmas special
CSIRO seminar room, 4-5

Speakers in 2010

(Jump to: 2015, 2014, 2013, 2012, 2011, 2009, 2008, 2007, 2006.)

Monday 18 January
Paul Thomas (CSIRO)
Component-level evaluation with best possible performance
CSIRO seminar room, 4-5
Monday 1 March
Tim' Jones (ANU)
The great market day experiment
CSIRO seminar room, 4-5
Thursday 4 March
Paul Thomas (CSIRO and ANU)
Information retrieval for real-world tasks
N101, 4-5
Monday 15 March
Rob Ackland (RSSS, ANU)
Analysing social networks with NodeXL and voson
CSIRO seminar room, 4-5
Monday 29 March
Shengbo Guo (NICTA)
Probabilistic latent maximal marginal relevance
Diversity has been heavily motivated as an objective criterion for result sets in the information retrieval literature and various ad-hoc heuristics have been proposed to explicitly optimize for it. In this talk, we will start from first principles and show that optimizing a simple criterion of set-based relevance in a latent variable graphical model—a framework we refer to as probabilistic latent accumulated relevance (PLAR)—leads to diversity as a naturally emergent property of the solution. PLAR derives variants of latent semantic indexing (LSI) kernels for relevance and diversity and does not require ad-hoc tuning parameters to balance them. PLAR also directly motivates the general form of many other ad-hoc diversity heuristics in the literature, albeit with important modifications that we show can lead to improved performance on a diversity testbed from the TREC 6-8 Interactive Track.
I received a Bachelor's degree in computer science in 2004, and a Master's degree in pattern recognition and intelligent systems in 2007. I moved to Australia for a PhD program in computer science in July 2007. Currently, I am a PhD candidate at the Australian National University, and a Graduate Researcher in the National ICT Australia.

CSIRO seminar room, 4-5
Monday 12 April
Tim Jones (ANU)
A different approach to user experiments
CSIRO seminar room, 4-5
Monday 19 April
Doug Oard (Maryland)
Who `dat?: identity resolution in large email collections
Automated techniques that can support the human activities of search and sense-making in large email collections are of increasing importance for a broad range of uses, including historical scholarship, law enforcement and intelligence applications, and lawyers involved in "e-discovery" incident to civil litigation. In this talk, I'll briefly describe some of the work to date on searching large email collections, and then for most of the talk I will focus on the more challenging task of support for sense-making. Specifically, I'll describe joint work with Tamer Elsayed to automatically resolve the identity of people who are mentioned ambiguously (e.g., just by first name) in a collection of email from a failed corporation (Enron). Our results indicate that for people who are well represented in the collection we can use a generative model to guess the right identity about 80% of the time, and for others we are right about half the time. I'll conclude the talk with a few remarks on our next directions for techniques, evaluation, and additional types of collections to which similar ideas might be applied.
N101, 2-3
Monday 10 May
Xuan Zhou (CSIRO)
Web service retrieval: are pre- and post-conditions different?
CSIRO seminar room, 4-5
Monday 24 May
Tom Rowlands (CSIRO and ANU)
Match report from WWW
CSIRO seminar room, 4-5
Monday 7 June
Wray Buntine (NICTA)
Discriminative IR
A general principle of machine learning is that discriminative classification generally works better than generative classification. No doubt, the same principle applies to information retrieval. The language modelling approach to IR, as it is usually implemented, is a simple generative model and by some analysts is little different to standard approaches. Here we discusss a discriminative model.
CSIRO seminar room, 4-5
Monday 21 June
Paul Thomas (CSIRO and ANU)
interfaces for government metasearch
CSIRO seminar room, 4-5
Thursday 1 July (note different day)
Glen Newton (Carleton University)
Visualizing a large journal collection using semantic indexing for use in search query refinement
We examine the scalability and utility of semantically mapping (visualizing) journals in a large scale (5.7+ million) science, technology and medical article digital library. This work is part of a larger research effort to evaluate semantic journal and article mapping for search query results refinement and visual contextualization in a large scale digital library. In this work the Semantic Vectors software package is parallelized and evaluated to create semantic distances between 2365 journals, from the sum of their full-text. This is used to create a journal semantic map whose production does scale and whose results are comparable to other maps of the scientific literature.
This presentation represents the state of the project, and will discuss the new work planned while at ANU.

CSIRO seminar room, 4-5
Monday 5 July
Andrew Gall (ANU and CSIRO)
Visualising social media
CSIRO seminar room, 4-5
Tuesday 13 July
Sam Huston (University of Massachusetts Amherst)
Evaluating verbose query processing techniques
N101, 4-5
Thursday 15 July
Paul Thomas (CSIRO and ANU)
Interfaces for government metasearch
N101, 4-5
Monday 2 August
David Hawking (Funnelback) and Paul Thomas (CSIRO and ANU)
Match report from SIGIR
CSIRO seminar room, 4-5
Monday 16 August
David Hawking (Funnelback)
Search quality tuning
CSIRO seminar room, 4-5
Monday 30 August
Sam Huston (University of Massachusetts Amherst)
Efficient indexing of repeated n-grams
The identification of repeated n-gram phrases in text has many practical applications, including authorship attribution, text reuse identification, and plagiarism detection. We consider methods for finding the repeated n-grams in text corpora, with emphasis on techniques that can be effectively scaled across a cluster of processors to handle very large amounts of text. We compare our proposed method to existing techniques using the 1.5 TB TREC ClueWeb-B text collection, using both single-processor and multi-processor approaches. The experiments show that our method offers a useful tradeoff between speed and temporary storage space, and provides an alternative to previous approaches that scales almost linearly in the length of the sequence, is largely independent of n, and provides a uniform workload balance across the set of available processors.
CSIRO seminar room, 4-5
Monday 13 September
Guido Zuccon (University of Glasgow)
The quantum probability ranking principle
In this talk I will present a new approach to the problem of ranking documents based on quantum probability theory. Key to the approach is the idea of Quantum Interference happening between document relevance judgements. This is formed through an analogy between the classic double slit experiment in physics and document ranking in IR. The analogy leads to a novel ranking principle, the Quantum Probability Ranking Principle (QPRP). I will suggest an instantiation of the ranking principle and explain how this can be operationalised, and how it relates to other ranking principles and strategies. In particular, I will show how the QPRP extends the classical PRP. Then, I show some experiments using this new principle on the IR task of subtopic retrieval. I will show that on this task the QPRP outperforms the PRP and state of the art approaches, showing that quantum probability theory can be successfully applied in IR. Finally, I would like to share with you some considerations about ranking strategies and ranking tasks in IR.
Reference material can be found at:
CSIRO videoconference room (note change of venue; this is on the top floor above the tearoom), 4-5
Monday 11 October
Paul Thomas (CSIRO and ANU)
Interaction differences in web server logs
CSIRO seminar room, 4-5
Monday 25 October
Bruce Croft (University of Massachusetts Amherst)
Thoughts (and research) on query intent
(papers and slides)
CSIRO seminar room, 4-5
Monday 8 November
David Hawking (Funnelback)
Search clouds on the horizon: can they stop inefficient retrieval systems warming the planet?
Search facilities for organisations need hardware redundancy to avoid downtime and spare capacity to cope with peaks in query load. The end result can be multiple servers each emitting more than a tonne of CO2 per year while sitting idle up to 90% of the time! Many environmentally responsible and cost conscious organisations attempt to address this problem through virtualisation, but encounter serious response time issues when physical hardware is over allocated or when different VMs compete for a single I/O subsystem. My talk will analyse the potential of "search clouds" to provide the necessary peak capacity and redundancy while substantially reducing emissions (and cost). It will also attempt to quantify the CO2 costs of inefficiencies in component algorithms inherent in an enterprise search facility, using typical parameters for representative organisation types.
CSIRO seminar room, 4-5
Monday 22 November
Alex Krumpholz (CSIRO and ANU)
CSIRO seminar room, 4-5
Monday 6 December
Khoi-Nguyen Tran (ANU)
Sensor networks/semantic Web
CSIRO seminar room, 4-5
Monday 20 December
Plans for 2011
CSIRO seminar room, 4-5

Speakers in 2009

(Jump to: 2015, 2014, 2013, 2012, 2011, 2010, 2008, 2007, 2006.)

Monday 19 January
Paul Thomas (CSIRO): Japanese food in distributed IR
CSIRO seminar room, 4-5
Monday 2 February
Tom Rowlands (CSIRO and ANU): quite the machine translation which is simple
CSIRO seminar room, 4-5
Monday 16 Feburary
Tim' Jones (ANU): examining web spam collections
CSIRO seminar room, 4-5
Monday 2 March
Xuan Zhou (CSIRO): from keywords to structured query—incremental query construction for semantic data
CSIRO seminar room, 4-5
Monday 30 March
Peter Christen (ANU): accurate synthetic generation of realistic personal information
Work with: Agus Pudjijono
CSIRO seminar room, 4-5
Monday 11 May
David Hawking (Funnelback): indexing and searching UK2007
and Tim Jones (ANU): match report from WWW
CSIRO seminar room, 4-5
Tuesday 19 May
Peter Christen (ANU): privacy-preserving data sharing and matching
RSISE seminar room, 4-5
Monday 25 May
Amir Hadad (ANU): Breast Cancer Data: How can it be analyzed?
CSIRO seminar room, 4-5
Monday 22 June
Andrew Lampert (CSIRO and Macquarie): email segmentation and recognising zones of text
Ian Ross seminar room, 4-5
Monday 6 July
Matt Adcock (CSIRO): one or two uses for temporal information
Ian Ross seminar room, 4-5
Monday 3 August
Paul Thomas (CSIRO): match report from SIGIR
Ian Ross seminar room, 4-5
Monday 10 August (note special date)
Sam Huston (UMass): detecting text reuse
Ian Ross seminar room, 4-5
Monday 31 August
Sukanya Manna (ANU): sentence similarity and document signatures
CSIRO seminar room, 4-5
Monday 14 September
Ramesh Sankaranarayana (ANU): assessing the quality of health information on the web
CSIRO seminar room, 4-5
Friday 25 September
Digital culture talk
Bernard de Broglio (Mosman Council): Dr Strangelove or: how I learned to stop worring and love the gov
National Library theatre, 12:30-1:30
Monday 28 September
Tom Rowlands (CSIRO): Tweets as annotations
CSIRO seminar room, 4-5
Monday 12 October
Tim Jones (ANU): Experiment participation
CSIRO seminar room, 4-5
Monday 26 October
Peter Christen (ANU): Similarity-aware indexing for real-time entity resolution
CSIRO seminar room, 4-5
Monday 9 November
Amir Hadad (ANU): Breast cancer survival prediction
CSIRO seminar room, 4-5
Monday 23 November
Ying-Hsang Liu (Charles Sturt University): the impact of MeSH terms on search effectiveness
CSIRO seminar room, 4-5

Speakers in 2008

(Jump to: 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2007, 2006.)

Monday 11 February
DCS seminar: Jeff Jonas (IBM): entity resolution
DCS seminar room, 9:30-11:30
Monday 11 February
Peter Bailey (CSIRO): what happens if we change relevance judges?
CSIRO seminar room, 4-5
Monday 25 February
Markus Weimer (NICTA): collaborative filtering
CSIRO seminar room, 4-5
Tuesday 18 March
NICTA seminar: Raquel Mochales Palau (Katholieke Universiteit Leuven): automatic detection of argumentation
Contact jon.gray@nicta.com.au
Monday 7 April
Tom Rowlands (CSIRO and ANU): evidence combination
CSIRO seminar room, 4-5
Wednesday 9 April
CSIRO seminar: Mounia Lalmas (Queen Mary): Information retrieval: from focused to aggregated answers
CSIRO seminar room, 9:30-10:30
Wednesday 9 April
CSIRO seminar: Thomas Roelleke (Queen Mary): Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes
CSIRO seminar room, 2-3
Tuesday 15 April
NICTA seminar: Jose Alferes (Universidade Nova de Lisboa): Efficiently combining rules and ontologies
NICTA seminar room, 4-5
Monday 21 April
Rob McArthur: detecting synonyms using blog data
CSIRO seminar room, 4-5
Thursday 15 May
DCS seminar: Peter Christen (ANU): Automatic training example selection for scalable unsupervised record linkage
and Denny (ANU): Exploratory hot spot profile analysis using interactive visual drill-down self-organizing maps
ANU seminar room (N101), 4-5
Monday 19 May
Sukanya Manna (ANU): term association in single documents
CSIRO seminar room, 4-5
Monday 2 June
Tim Jones (ANU): research I'm about to do in adversarial IR
CSIRO seminar room, 4-5
Monday 16 June
Peter Christen (ANU): match report from PAKDD and Thoughts about synergies between IR and Data Mining
CSIRO seminar room, 4-5
Tuesday 24 June
CSIRO seminar: Emily Zhou (CSIRO): FRAS: A Symbolization-based Approach to Video Similarity Search
Monday 30 June
Wray Buntine (NICTA): An informal look at probabilistic models of IR
CSIRO seminar room, 4-5
Monday 28 July
Tara McIntosh (USyd)
CSIRO seminar room, 4-5
Tuesday 29 July
NICTA seminar: Dawei Song (The Open University): Learning context-sensitive term associations for intelligent information retrieval
RSISE Seminar Room, ground floor, building 115, cnr. North and Daley Roads
Monday 11 August
Paul Thomas (CSIRO): match report from SIGIR
CSIRO seminar room, 4-5
Friday 22 August
ADSRI seminar: Robert Ackland (ANU): The visibility of government on the web: insights from a large-scale crawl
Coombs seminar room A, 12:30-2
Monday 25 August
Nina Studeny (Fachhochschule Technikum Wien): analysing XML corpora
CSIRO seminar room, 4-5
Monday 8 September
Alex Krumpholz (CSIRO): matching medical records
CSIRO seminar room, 4-5
Monday 22 September
David Hawking (Funnelback): experiences in turning IR theory into practice
CSIRO seminar room, 4-5
Monday 20 October
Tim' Jones (ANU): Investigating the effect of spam results on user experience
CSIRO seminar room, 4-5
Monday 17 November
Tom Rowlands and Alex Krumpholz (CSIRO and ANU): match report from HCSnet; and
Dave Hawking (Funnelback): match reports from the IR Facility Symposium and SPIRE; and
Paul Thomas (CSIRO): match report from IIiX II
smaller CSIRO seminar room, 4-5
Monday 1 December
Tom Rowlands (CSIRO and ANU): anonymous folksonomies for small enterprise webs
CSIRO seminar room, 4-5
Monday 15 December
Jan-Felix Schmakeit (UniSA): image labelling with clickhroughs
CSIRO seminar room, 4-5

Speakers in 2007

(Jump to: 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2006.)

Monday 26 March
Peter Bailey (CSIRO): the effect of branding on perceptions of search quality
CSIRO seminar room, 3-4
Monday 23 April
Tom Rowlands (CSIRO): sampling from query logs
CSIRO seminar room, 3-4
Monday 23 April
DCS seminar: Peter Bailey : secure search inside the enterprise
DCS seminar room, 4-5
Monday 14 May
DCS seminar: Paul Thomas : sampling random documents from uncooperative search engines
DCS seminar room, 4-5
Wednesday 30 May
DCS honours seminar: Yinghua Zheng : privacy-preserving string comparisons
DCS seminar room, 11-11:30
Monday 28 May
Rob McArthur (CSIRO): finding experience
CSIRO seminar room, 3-4
Monday 25 June
Tim Jones (DCS): calculating PageRank in O(reasonable)
CSIRO seminar room, 3-4
Monday 16 July (note special date)
Anuj Kumar and Deepak Agrawal (CSIRO): the TREC enterprise track at CSIRO
CSIRO seminar room, 3-4
Tuesday 17 July
DCS postgraduate meeting: Tom Rowlands (CSIRO): sampling from query logs
DCS seminar room, 4:30-5
Wednesday 25 July
DCS mid-term honours seminar: Lan Du: ontology-driven text mining for digitial forensics
DCS seminar room, 2:23-2:43
Monday 3 September (Rescheduled)
Amir Hadad (DCS): fuzzy logic in IR
CSIRO seminar room, 3-4
Monday 10 September (Note special date)
Peter Bailey, Tom Rowlands, David Hawking, and Ross Wilkinson (all CSIRO): match reports from SIGIR and MSR Beijing
and Sukanya Manna (DCS): a fuzzy relational model to calculate the relatedness of entities
CSIRO seminar room, 3-4
Wednesday 19 September
DCS seminar: Roger Clarke : Big Brother Google?
DCS seminar room, 5-6
Monday 24 September
George Ferizis (CSIRO): Using personal attributes as context
CSIRO seminar room, 3-4
Monday 8 October
CSIRO/DCS seminar: Milad Shokouhi (RMIT): Federated text retrieval from independent collections
DCS seminar room, 3-4
Monday 22 October
Tim Jones (DCS): IR problems in music collections
CSIRO seminar room, 3-4
Monday 26 November
Tom Rowlands (CSIRO): lightweight enterprise tagging
and Peter Bailey (CSIRO): the TREC enterprise track
CSIRO seminar room, 3-4

Speakers in 2006

(Jump to: 2015, 2014, 2013, 2013, 2011, 2010, 2009, 2008, 2007.)

26 April
Peter Christen (DCS) and Tom Rowlands (CSIRO)
24 May
James Sinclair (Engineering)
19 July
Eric McCreath (DCS)
13 September
Alex Krumpholz (CSIRO)

Want to present something?

If there's any work in progress (or completed, or not really started) which you would like to share over wine and cheese, please let Paul know. You don't need finished work, and you don't need an hour's worth of fancy slides, just a willingness to talk about what you're up to. The idea is to get discussion going, not to present eternal verities.