Maximizing data availability

Simulated diffraction pattern from a urea
			  nanocrystal

LCLS, the world's first hard X-ray laser, is leading a revolution in coherent X-ray imaging.

First single-shot diffraction image of a
		     virus, the first steps of a new technology.

First ultrafast diffraction image of a Mimivirus, paving the way for new methods to image life.

Simulated diffraction pattern from a urea
			  nanocrystal

Fast X-ray detectors generate vast amounts of data such as the CXI detector at LCLS, capable of recording 40TB a day.

Simulated diffraction pattern from a urea
			  nanocrystal

A strong data analysis infrastructure is critical to keep up with the exponential growth rate of X-ray free-electron lasers.

Welcome to the Coherent X-ray Imaging Data Bank (CXIDB), a new database which offers scientists from all over the world a unique opportunity to access data from Coherent X-ray Imaging (CXI) experiments.

• New light sources and detectors have enabled novel experiments producing terabytes of data per day.

•   It's the dawn of the data deluge era for coherent X-ray imaging.

•   To best make use of all these data it is necessary to make them accessible.

Accessibility is crucial not only to make efficient use of experimental facilities, but also to improve the reproducibility of results and enable new research based on previous experiments.

We must all accept that science is data and data are science, and thus provide for [...] much improved data curation.
— B. Hanson, A. Sugden, B. Alberts
Science

CXIDB is dedicated to further the goal of making data from Coherent X-ray Imaging (CXI) experiments available to all, as well as archiving it. The website also serves as the reference for the CXI file format, in which most of the experimental data on the database is stored in.

CXI version 1.4 released

Posted by Filipe Maia on June 23, 2014

This new version of the CXI format introduces special support for modular pixel detectors (e.g. the CSPAD), storing each of the modules as a 2D array. This results in 3D arrays to store an image, and 4D to store an image scan from a modular detector. A score and tags field was also added to detectors to make it easier to sort and filter large datasets. Also the concept of implicit axis was added to make scans easier to use.

Finally a new Result class was added to store non-image analysis results. The new document can be found here.

Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers

Posted by Filipe Maia on March 17, 2014

Johan Hattne and Nick Sauter have deposited the raw XTC files used in their latest paper published in Nature Methods.

You can download the data from its page.

Serial Femtosecond Crystallography of G Protein-Coupled Receptors

Posted by Filipe Maia on December 19, 2013

Vadim Cherezov has made available the Cheetah preprocessed images, formated for use with CrystFEL, used in their latest Science paper on the structure of the 5-HT2B receptor bound to ergotamine grown in lipidic cubic phase.

You can obtain the data from the ID 21 page.

De novo protein crystal structure determination from X-ray free-electron laser data

Posted by Filipe Maia on November 25, 2013

Thomas Barends and Lutz Foucar have released the LCLS raw XTC files used in their Nature paper showing experimental phasing in serial femtosecond crystallography.

You can find the data in CXIDB ID 22

LCLS XTC raw data for High-Resolution Protein Structure Determination by SFX uploaded

Posted by Filipe Maia on October 24, 2013

Sébastien Boutet has made available the XTC files related to CXIDB entry ID 17, and they are now available for download.

You can download them from the ID 17 raw data page. Please bear in mind that the files are stored on tape so it can take several minutes for the server to reply to a download request.

Nature announces Scientific Data, to help publish discover and reuse research data

Posted by Filipe Maia on October 19, 2013

Nature Publishing Group will next spring launch Scientific Data, an open-access, online-only journal for detailed descriptions of data sets.

I think this provides a great opportunity to reward researches with data to publish but who have difficulties publishing a suitable reference to it, given them a change to receive the proper acknowledgement they deserve.

For more details please check the announcement.

Single-particle structure determination by correlations of snapshot X-ray diffraction patterns

Posted by Filipe Maia on Febuary 1, 2013

ID-20 has just been made available. It contains the data pertaining to the paper by D. Starodub et al. Single-particle structure determination by correlations of snapshot X-ray diffraction patterns, Nature Communications, December 2012.

CXIDB paper published

Posted by Filipe Maia on Febuary 1, 2013

I have authored a short paper about CXIDB F. Maia The Coherent X-ray Imaging Data Bank, Nature Methods, August 2012. It has been published a while ago I forgot to announce it here!

Please cite this reference when you make of of the database.

One more dataset

Posted by Filipe Maia on August 30, 2012

A dataset from the article A. V. Martin et al., Femtosecond dark-field imaging with an X-ray free electron laser, Optics Express, June 2012 was kindly deposited.

It describes a new interesting technique for recovering as much information as possible from diffraction patterns with missing data.

The new dataset was given ID-19.

Fractal morphology dataset deposited

Two new datasets deposited

Posted by Filipe Maia on June 25, 2012

Two new datasets have just been published, one obtained using serial femtosecond crystallography and a second one using X-ray projection imaging on randomly oriented mask.

The first one dataset comes from the article by Sébastien Boutet et al., High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography, Science, May 2012.

The second dataset correspond to the data from the article Hugh T. Philipp et al., Solving structure with sparse, randomly-oriented x-ray data, Optics Express, June 2012.

You can find them under ID-17 and ID-18 respectively.

CXI version 1.3 released

Posted by Filipe Maia on April 20, 2012

The biggest change is the introduction of the concept of scans to accomodate datasets where one experimental parameter is continuously changed such as wavelength or sample rotation. A new axes attribute was introduced, related with scans. The ptychography example was updated, and now makes use of scans. Several small corrections and clarifications. As usual the document can be found on github.

A new software page and the public release of CrystFEL.

Posted by Filipe Maia on March 21, 2012

The public release of CrystFEL spurred me to finally create a page dedicated to software useful for CXI experiments. You can find it under Resources.

CrystFEL was published by Thomas White et al. in a paper entitled CrystFEL: a software suite for snapshot serial crystallography in latest issue of the J. of Appl. Crystallography. This software makes it possible to analyze data from serial femtosecond nanocrystallography experiments like that described in Femtosecond X-ray protein nanocrystallography by Henry Chapman et al.

You can obtain it at http://www.desy.de/~twhite/crystfel.

Recent paper in Nature argues for open computer programs.

Posted by Filipe Maia on March 12, 2012

D. Ince, L. Hatton and J. Graham-Cumming have published an interesting perspective in Nature entitled The case for open computer programs.

In it they argue that the full release of the source code of software involved in scientific publications is as important as releasing the experimental data. One of the major argument for it are that it provides an exact description of the algorithm used, enabling reproducibility.

This is an important subject and I'm glad it's receiving greater attention.

Paper describing previously deposited LCLS dataset published

Posted by Filipe Maia on February 24, 2012

S. Kassemeyer et al. have recently published Femtosecond free-electron laser x-ray diffraction data sets for algorithm development, in Optics Express, describing in detail the dataset from LCLS that has been previously deposited.

This dataset corresponds to IDs 10 to 14 and their records have now been updated to reflect the publication.

3D X-ray diffraction imaging dataset deposited

Posted by Filipe Maia on November 15, 2011

125 diffraction images of a silicon nitride pyramid sprinkled with gold balls, taken at different tilts, 1 degree appart have just been deposited. These are the experimental data used in the article Chapman, H.N. et al. High-resolution ab initio three-dimensional x-ray diffraction microscopy, Journal of the Optical Society of America A, 23, 1179-200, May 2006.

You can find it here.

Raw data from LCLS dataset available online

Posted by Filipe Maia on October 21, 2011

The raw data corresponding to IDs 10 to 14, the large LCLS dataset recently deposited is now available online.

The files are in XTC format, the native LCLS format, and they are very large so we recommend that you use a dedicated program to download them, not a web browser. If you plan to download a large fraction of the data please contact us so we can arrange a more efficient method.

The data can be accessed from a link at the bottom of each entry.

Missing pixels in LCLS dataset

Posted by Filipe Maia on September 21, 2011

There was an error during the conversion of the LCLS dataset which caused one row of pixels to be missing. This has now been corrected. If you have datasets with 511 pixels in one dimension please download them again. Thanks to Chun Yoon for pointing it out.

Large LCLS dataset deposited

Posted by Filipe Maia on August 21, 2011

Five datasets containing selected runs from two LCLS experiments, AMO15010 and AMO10510 have been added to the data bank. The datasets contain diffraction images of nanorice, magnetosomes, TMV, T4 and PBCV.

You can find the links from the Browse Data page as usual.

Nanorice data from FLASH experiments deposited

Posted by Filipe Maia on August 1, 2011

A large number of diffraction images from iron oxide ellipsoids (often referred to as nanorice) have been deposited. These include the images used in the article N. D. Loh et al., Cryptotomography: reconstructing 3D Fourier intensities from randomly oriented single-shot diffraction patterns, Phys Rev Lett., 104, 225501, June 2010

Labeled Yeast data added to the Data Bank

Posted by Filipe Maia on July 22, 2011

The data used in the article Johanna Nelson et al., High-resolution x-ray diffraction microscopy of specifically labeled yeast cells, PNAS, 16, 7235-7239, April 2010, has been added to the data bank. Five different tilts of the same sample correspond to IDs 4 through 8. ID 4 also includes the reconstructed image.

You can access them through the Browse Data page.

CXI file format updated

Posted by Filipe Maia on July 21, 2011

There have been a few changes to the CXI file format, the most substantial ones being requiring SI units for all quantities and a new method to describe the orientation of CCD detectors. Check the file format page for more details.

New entry in the database

Posted by Filipe Maia on February 25, 2011

The data corresponding to the first experimental demonstration of the diffraction before destruction concept, has been added to the database.

The group, which presented their findings in the article Henry N. Chapman et al., Femtosecond diffractive imaging with a soft-X-ray free-electron laser, Nature Physics, 2, 839-843, November 2006, has now deposited the data which was assigned ID 3.

Science highlights data access and organization

Posted by Filipe Maia on February 22, 2011

Science has recently published a special issue entitled "Dealing with Data".

The issue (which is available for free), focus on the challenges posed by the current data deluge, including the difficulty in managing and organizing such large data quantities as well as the wasted effort in data duplication due to the lack of funding for preserving data.

I think this is both highly relevant to this website and well worth a read.

Mimi virus data inaugurates the Data Bank

Posted by Filipe Maia on February 2, 2011

The data corresponding to the background corrected data and configuration files used in the article M. Marvin Seibert, Tomas Ekeberg, Filipe R. N. C. Maia et al., Single mimivirus particles intercepted and imaged with an X-ray laser, Nature, 470, 78-81, February 2011, are the first two entries in the data bank, with IDs 1 and 2.

You can also access them through the Browse Data page.

CXI File Format Draft Version 1.0 Uploaded

Posted by Filipe Maia on January 11, 2011

I have just uploaded the CXI File Format specification document, Draft Version 1.0. Please let us know about your opinions/critics.

Citing CXIDB

If you make use of CXIDB for your publication, please cite:

Maia, F. R. N. C. The Coherent X-ray Imaging Data Bank. Nat. Methods 9, 854–855 (2012).

The continued growth of CXIDB depends on the support of institutions and research grants which ultimately use citations as a measure of quality.