Tutorial: MalariaMine
If you have any problems running this tutorial please contact support [at] flymine.org.
This tutorial explains how to make a new biological data mine using the InterMine.bio system. The example used is Plasmodium falciparum - MalariaMine. It shows how to configure existing InterMine.bio sources for a new organism and experimental data. The configuration files and data required to build malariamine are all found in bio/tutorial/malariamine. The tutorial steps through copying files from here to explain the purpose of each; alternatively you could copy the whole bio/tutorial/malariamine directory to your local malariamine directory.
For an example of the completed MalariaMine see http://www.flymine.org/malariamine
I. Building the data warehouse
- Check that you have the required software installed and configured, see prerequisites.
- Get the InterMine software
- Create directories for your new mine
- Create a /malariamine directory to run this tutorial from - at same level as /flymine
- Create directories for the sub-projects in the new mine that are required for building the data warehouse. These sub-projects are:
- dbmodel - deals with merging model additions from selected sources and creating the production database schema
- integrate - runs targets to build the data warehouse from source data
- postprocess - operations to run on the completed data warehouse, such as setting sequences for genome features
mkdir malariamine/dbmodel/ mkdir malariamine/dbmodel/lib/ mkdir malariamine/dbmodel/resources/ mkdir malariamine/dbmodel/src/ mkdir malariamine/integrate mkdir malariamine/integrate/lib/ mkdir malariamine/integrate/resources/ mkdir malariamine/integrate/src/ mkdir malariamine/postprocess mkdir malariamine/postprocess/lib/ mkdir malariamine/postprocess/resources/ mkdir malariamine/postprocess/src/
- Configure
- You need to set up several properties files that determine the location of your database, etc.
- You can copy these files from bio/tutorial/malariamine
- In the malariamine.properties file, edit the serverName, user and password properties to your postgres login details.
filename location purpose malariamine.properties your home directory database locations and login details default.intermine.integrate.properties /malariamine configures ObjectStore project.xml /malariamine names and locations of the datasources to be loaded genomic_priorities.properties /malariamine/dbmodel/resources describes how to resolve conflicting data when integrating genomic_keyDefs.properties /malariamine/dbmodel/resources lists the identifiers used when integrating new data objectstoresummary.config.properties /malariamine/dbmodel/resources configure fields to appear as a dropdown in forms project.properties one for each new project directory - dbmodel, integrate and postprocess Sets project specific properties build.xml one for each new project directory - dbmodel, integrate and postprocess ant build file
- Create the databases
- Create PostgreSQL databases for temporary items and for the final production database (as specified in the malariamine.properties file):
createdb common-src-items createdb common-tgt-items createdb production-malaria
- Create PostgreSQL databases for temporary items and for the final production database (as specified in the malariamine.properties file):
- Build the database
- The PostgreSQL database to use is specified in the malariamine.properties file and will need to be created first (see above).
- This will remove any existing data from production-malaria and needs to be done each time the integration is started from scratch.
- This step reads the list of sources from 'malariamine/dbmodel/build.xml' and merges the list of model additions (specified in '*_additions.xml' files) to the core data model. Each source can add classes and fields to the model.
# in malariamine/dbmodel: ant clean ant build-db
- Set up data to be integrated
- Load data into your database
- Run ant in the integrate directory
- On a machine with 4Gb RAM (running postgreSQL and Java on the same machine) this takes about 90 minutes to complete.
- Alternatively use the project_build script.
- Run postprocessing steps on the integrated database
- Postprocessing operations are those performed on the integrated data before releasing a webapp.
- For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed.
- NOTE - this is done automatically after integration if using project_build script.
II. Deploying the web application
- Get the software
- Copy the malariamine webapp configuration from bio/tutorial/malariamine/webapp to your local directory, malariamine/webapp.
- You should get these files and directories:
malariamine/webapp/build.xml malariamine/webapp/lib malariamine/webapp/project.properties malariamine/webapp/resources malariamine/webapp/src
- Configure
- You need to configure a couple of properties files that determine the location of your webapp, etc.
- You can copy these files from bio/tutorial/malariamine
- Add details of your local tomcat installation to the build.properties.malariamine file.
filename location purpose build.properties.malariamine your home directory configures webapp settings and deployment default.intermine.webapp.properties /malariamine default InterMine properties for the webapp class_keys.properties /malariamine/dbmodel/resources specifies keys for classes in the data model
- Create userprofile database
- This database is used while the webapp is running to store templates, saved queries and login information. The database name is configured in malariamine.properties.
createdb userprofile-malaria
- Create tables in the userprofile database, load some example template queries and create the superuser login:
# In malariamine/webapp: ant build-db-userprofile
- This database is used while the webapp is running to store templates, saved queries and login information. The database name is configured in malariamine.properties.
- Release your website
- Compile and build the webapp .war file. This fetches the model from the database, compiles model java code and summarises the contents of the database:
# In malariamine/webapp: ant default remove-webapp release-webapp
- Compile and build the webapp .war file. This fetches the model from the database, compiles model java code and summarises the contents of the database:
- Use your website
- Test the released webapp by accessing tomcat_server:port/malariamine, e.g. localhost:8080/malariamine
- Tomcat deployed your webapp to the path defined in build.properties.malariamine
See also: GettingStarted, MineHowTo, Customise your website
