AT&T Laboratories Cambridge

ICON

Introduction

The ICON system aims to enable users to easily and effectively view, navigate, and search collections of digital images. It combines an intuitive cross-platform thumbnail based user interface with powerful image processing and content description functionality to facilitate automated organisation and retrieval of large heterogeneous image sets based on both meta data and visual content.

In order to adapt to the varying demands of home users, professional photographers, and commercial image collections, the system is designed to be inherently flexible and extensible. A client-server split and object oriented design provide layers of abstraction and encapsulation, thus allowing the system to be customised for a given application domain to meet specific user requirements. ICON currently consists of two parts:

The ICON client, an application written in Java which can be used either in stand-alone mode or by connecting to the ICON repository. While the current focus is on an exploration of user interface, browsing, and retrieval paradigms for medium-scale collections of personal images, it is envisaged that future client programs will cater for particular application environments such as online image searching and professional photographic archives.
The ICON repository, a central file archive and set of software tools which apply the segmentation and content classification routines developed by the PERMM project to pictures exported from the ICON client. This process enables content-based searching and visualisation of such images.

Browsing and organisation

In order to provide a quick and convenient means of viewing and manipulating images via the ICON client, the user is presented with a familiar directory tree view onto the local and remote filesystems. With a single mouse click the client can be made to scan the directory structure to search for images, create thumbnails, and extract meta data such as digital camera settings and annotations.

Images can then be exported to the repository to generate visual content descriptors and for archiving purposes. The repository stores meta data and images on a per-user basis but also provides support for collaborative access and for pictures and associated data to be re-imported into the ICON client. Each user may access the repository through an arbitrary number of clients running on any machine or operating system with a current version of the Java runtime environment.

Both images stored locally and those in the repository can be browsed and organised according to meta data and visual properties rather than just on a per-directory basis. The ICON client provides a range of methods for easy navigation of potentially very large image sets, including sophisticated clustering and visualisation functionality.

Image retrieval

Our research effort aims to make robust content based image retrieval (CBIR) of general digital images a reality. The image analysis carried out by the ICON repository segments pictures into regions with associated visual properties and uses neural network classifiers to assign a probabilistic labelling of such image regions with semantic terms corresponding to visual categories such as grass, sky, and water.

The ICON client allows image databases to be searched according to meta data (e.g. picture date and digital camera make), annotations, and classified image content. Queries can be formulated in a number of different ways to cater for a range of different retrieval needs and levels of detail in the user's conceptualisation of desired image material. A query may comprise one or several of the following elements:

A set of weighted sample images (both positive and negative examples).
Desired image content composed my means of a sketch based query composition tool which uses a visual thesaurus of target image content corresponding to the set of visual categories.
A textual or forms-based query expressed in oquel (Ontological Query Language), a novel query description language featuring a specification syntax and extensible vocabulary.
Criteria for various properties including file attributes (e.g. modification date), digital camera settings (e.g. camera model, flash), textual annotations (such as the artist's name of a painting), and constraints on visual appearance features (colour, shape, texture).

The user may assign different weights to the various elements that comprise a query and can choose from a set of similarity metrics to specify the emphasis that is to be placed on the relative localisation of target content within images and overall compositional aspects.

Despite its sophistication, the retrieval system is easy to use and simple queries can be created very rapidly. The search process also entails an element of interaction as users can provide relevance feedback by selecting a few relevant or non-relevant images after an initial search which causes the query elements to be re-weighted to adapt to the user's retrieval requirement and expectations.

Image analysis

In order to enable retrieval of images based on their visual properties and semantically labelled content, ICON performs a number of pre-processing and image analysis stages on images exported to the repository:

Image segmentation: Images are segmented into non-overlapping regions and sets of properties (localisation, shape, boundaries, colour, texture) are computed for each region.
Classification: Region descriptors computed from the segmentation algorithm are fed into artificial neural network classifiers which have been trained to label regions with class membership probabilities for a set of 12 semantically meaningful visual categories of "stuff" such as grass, sky, and skin.
Content representation: Each image is associated with content descriptors at different levels of abstraction and spatial granularity:
- Region mask: a canonical representation of the segmented image giving the absolute location of each region and the associated parameterisation.
- Region graph: graph of relative spatial relationships (adjacency, distance, joint boundary, and containment).
- Grid pyramid: for each visual category, proportion of image content which has been positively classified (as computed by the region labelling) at different regular grid spacings (1x1, image fifths, 8x8).

The choice of visual categories such as grass or water which mirror aspects of human perception allows the implementation of intuitive and versatile query composition methods while greatly reducing the search space. Through the relationship graph representation of regions we can make the matching of clusters of regions invariant with respect to displacement and rotation, whereas the grid pyramid representation caters for a comparison of absolute position and size. This may be regarded as an intermediate level representation which does not preclude additional stages of visual inference and composite object recognition in light of query specific saliency measures and the integration of contextual information.

Screen shots

Click the links below to see some screen shots of the ICON client (these pages may require some time to load over a slow connection).

Contact information

If you have any queries about our image retrieval and browsing technology, please contact [email protected]. Some relevant technical reports can be found in the publications section of the main research web page.