IR and friends

This is a discussion group for people working in information retrieval, data mining, document computing, social media, and similar fields. Everyone's welcome.

We meet every second Monday, from 4-5pm, in the CSIRO seminar room in the ANU CS&IT building. CSIRO are kind enough to provide wine (and soft drinks) and cheese. We occasionally have seminars and events at other times as well.

"IR and friends" aims to encourage discussion between people working in similar fields; to provide a venue for feedback on work in progress; and to get people from different groups talking to each other.

Want to present something? If there's any work in progress (or completed, or not really started) which you would like to share over wine and cheese, please let Paul or Gaya know. You don't need finished work, and you don't need an hour's worth of fancy slides, just a willingness to talk about what you're up to. The idea is to get discussion going, not to present eternal verities.

Upcoming meetings

Monday 20 June
Michael Curtotti (ANU)
TBA
CSIRO seminar room, 4-5
Mondays 4 and 18 July; 1, 15, and 29 August; 12 and 26 September
TBA
CSIRO seminar room, 4-5
Monday 10 October
Jae Kim (ANU)
Pagination versus scrolling in mobile web search

Vertical scrolling is the standard method of exploring search results pages. For touch-enabled mobile devices that are not equipped with a mouse or keyboard, we adopt other methods of controlling the viewport with the aim of investigating search interaction. From the intuition that people are used to reading books by turning pages horizontally, we conducted a user experiment to investigate the effects of horizontal and vertical control types (pagination versus scrolling) on a touch-enabled mobile phone. Our ndings suggest that pagination improves search over scrolling, despite scrolling being more familiar. The main reason for this is the time taken for the scroll itself. Participants using scrolling also spend less time reading lower-ranked results with lower search accuracy even if this is where relevant documents are found. We conclude that search engines need to provide different viewport controls to allow a better search experience on touch-enabled mobile devices.

CSIRO seminar room, 4-5
Mondays 24 October; 7 and 21 November; 5 December
TBA
CSIRO seminar room, 4-5

Past meetings in 2016

Here's what we did earlier:

Monday 11 April
David Hawking (Microsoft)
Simulated text corpora

We propose a method to generate simulated text corpora of arbitrary size. Such corpora are potentially useful when working with private data or as a means to reproducible studies of the efficiency and scalability of retrieval algorithms. For eight different corpora we extract attributes and model the distributions of both term frequencies (piecewise linear with special treatment of head and tail) and document lengths (Gaussian). We model how those attributes and distributions change across samples of a corpus as the samples grow from 1% to 100% of the parent.

We use the above models and a synthetic collection generator (code to be made available as open source) to emulate each of the corpora. Our generator creates documents comprising synthetic words in random order and very accurately mimicks vocabulary size and the term probability distribution of the base corpus. Using the static model for a 1% corpus and applying a generic growth model derived from multiple corpora we are able to emulate key parameters of the original 100% corpus with reasonable accuracy.

Synthetic collections from our generator potentially allow exactly reproducible efficiency experiments and accurate study of algorithmic scalability. They avoid the normal confounds of differences in tokenization and character set conversion of normal text. Our generator mimicks a real corpus with sufficient fidelity to evaluate core aspects of indexing efficiency around postings list lengths, document table and term table. With important provisos, this can be done with negligible interference to CPU and memory caches, allowing on-the-fly generation internal to an indexer. Generated corpora can be tailored to the sizes and characteristics needed for specific experiments. They can be shared with other researchers by communicating less than a kilobyte, even if derived from a private corpus.

CSIRO seminar room, 4-5
Monday 9 May
IR and friends turns 10!

"IR and friends" first met on 26 April 2006, when Peter Christen and Tom Rowlands talked about their work-in-progress. Since then we've had over 150 regular talks from universities, government, and industry; plus special talks from visiting colleagues around the world. Help us celebrate ten years of research, discussion, and community.

We will mark the occasion with short talks, reflecting on research and practice, from Tom Rowlands (Australian Crime Commission), Simon Kravis (2XX), Robert Power (CSIRO), Tom Gedeon (ANU), and David Hawking (Microsoft). There'll also be cake.

CSIRO seminar room, 4-5

Earlier meetings

We also met in 2006 to 2015.