Tutorial: MalariaMine

If you have any problems running this tutorial please contact support [at] flymine.org.

This tutorial explains how to make a new biological data mine using the InterMine.bio system. The example used is Plasmodium falciparum - MalariaMine. It shows how to configure existing InterMine.bio sources for a new organism and experimental data. The configuration files and data required to build malariamine are all found in bio/tutorial/malariamine. The tutorial steps through copying files from here to explain the purpose of each; alternatively you could copy the whole bio/tutorial/malariamine directory to your local malariamine directory.

For an example of the completed MalariaMine see http://www.flymine.org/malariamine

I. Building the data warehouse

  1. Check that you have the required software installed and configured, see prerequisites.
  1. Get the InterMine software
  1. Create directories for your new mine
    • Create a /malariamine directory to run this tutorial from - at same level as /flymine
    • Create directories for the sub-projects in the new mine that are required for building the data warehouse. These sub-projects are:
      • dbmodel - deals with merging model additions from selected sources and creating the production database schema
      • integrate - runs targets to build the data warehouse from source data
      • postprocess - operations to run on the completed data warehouse, such as setting sequences for genome features
        mkdir malariamine/dbmodel/
        mkdir malariamine/dbmodel/lib/
        mkdir malariamine/dbmodel/resources/
        mkdir malariamine/dbmodel/src/
        mkdir malariamine/integrate
        mkdir malariamine/integrate/lib/
        mkdir malariamine/integrate/resources/
        mkdir malariamine/integrate/src/
        mkdir malariamine/postprocess
        mkdir malariamine/postprocess/lib/
        mkdir malariamine/postprocess/resources/
        mkdir malariamine/postprocess/src/
        
  1. Configure
    • You need to set up several properties files that determine the location of your database, etc.
    • You can copy these files from bio/tutorial/malariamine
    • In the malariamine.properties file, edit the serverName, user and password properties to your postgres login details.
filenamelocationpurpose
malariamine.properties your home directory database locations and login details
default.intermine.integrate.properties/malariamineconfigures ObjectStore
project.xml /malariaminenames and locations of the datasources to be loaded
genomic_priorities.properties /malariamine/dbmodel/resourcesdescribes how to resolve conflicting data when integrating
genomic_keyDefs.properties/malariamine/dbmodel/resources lists the identifiers used when integrating new data
objectstoresummary.config.properties/malariamine/dbmodel/resources configure fields to appear as a dropdown in forms
project.propertiesone for each new project directory - dbmodel, integrate and postprocess Sets project specific properties
build.xmlone for each new project directory - dbmodel, integrate and postprocessant build file
  1. Create the databases
    • Create PostgreSQL databases for temporary items and for the final production database (as specified in the malariamine.properties file):
         createdb common-src-items
         createdb common-tgt-items
         createdb production-malaria
      
  1. Build the database
    • The PostgreSQL database to use is specified in the malariamine.properties file and will need to be created first (see above).
    • This will remove any existing data from production-malaria and needs to be done each time the integration is started from scratch.
    • This step reads the list of sources from 'malariamine/dbmodel/build.xml' and merges the list of model additions (specified in '*_additions.xml' files) to the core data model. Each source can add classes and fields to the model.
         # in malariamine/dbmodel:
         ant clean
         ant build-db
      
  1. Set up data to be integrated
  2. Load data into your database
    • Run ant in the integrate directory
    • On a machine with 4Gb RAM (running postgreSQL and Java on the same machine) this takes about 90 minutes to complete.
    • Alternatively use the project_build script.
  3. Run postprocessing steps on the integrated database
    • Postprocessing operations are those performed on the integrated data before releasing a webapp.
    • For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed.
    • NOTE - this is done automatically after integration if using project_build script.

II. Deploying the web application

  1. Get the software
    • Copy the malariamine webapp configuration from bio/tutorial/malariamine/webapp to your local directory, malariamine/webapp.
    • You should get these files and directories:
        malariamine/webapp/build.xml
        malariamine/webapp/lib
        malariamine/webapp/project.properties
        malariamine/webapp/resources
        malariamine/webapp/src
      
  1. Configure
    • You need to configure a couple of properties files that determine the location of your webapp, etc.
    • You can copy these files from bio/tutorial/malariamine
    • Add details of your local tomcat installation to the build.properties.malariamine file.
filenamelocationpurpose
build.properties.malariamineyour home directoryconfigures webapp settings and deployment
default.intermine.webapp.properties/malariaminedefault InterMine properties for the webapp
class_keys.properties/malariamine/dbmodel/resourcesspecifies keys for classes in the data model
  1. Create userprofile database
    • This database is used while the webapp is running to store templates, saved queries and login information. The database name is configured in malariamine.properties.
      createdb userprofile-malaria
      
    • Create tables in the userprofile database, load some example template queries and create the superuser login:
      # In malariamine/webapp:
      ant build-db-userprofile
      
  1. Release your website
    • Compile and build the webapp .war file. This fetches the model from the database, compiles model java code and summarises the contents of the database:
      # In malariamine/webapp:
      ant default remove-webapp release-webapp
      
  1. Use your website
    • Test the released webapp by accessing tomcat_server:port/malariamine, e.g. localhost:8080/malariamine
    • Tomcat deployed your webapp to the path defined in build.properties.malariamine

See also: GettingStarted, MineHowTo, Customise your website