About MindTrails Data

Back to the main manu

A command-line tool that handle data exporting, decrypting and basic checking for mindtrails or mindtrails-like website. Support multiple websites data collecting. It also contains toolbox for data analysis.

Environment Note: For Mac Users: Please update Xcode version to ensure successful update.

Basic idea

The main idea of MTData is three-fold:

  1. export encrypted data as json files
  2. decode json files locally into csv files
  3. report basic missing data measurements based on the decrypted csv files

At the end of the day you will have raw backup json data and ready-to-use csv format data for your analysis. You also can refer to automatically generated reports and logs for data integrity issues.

Examples

# download and delete all the deleteable questionnaire entries on multiple servers
$ MTData export . .

# or

$ MTData export

# download all the questionnaire entries that should not be deleted from the templeton server.
$ MTData export templeton static

# decode all the questionnaire for mindtrails server.
$ MTData decode mindtrails .

# Generate data checking tables that calculate the percentage of missing data for each column by questionnaire, for all.
$ MTData report scale

# Generate data checking tables that calculate the percentage of missing questionnaire for each participant in mindtrails project(servers)
$ MTData report client mindtrails

You could also create simple bash script with these tools to setup their export, decode and report schedule.

Getting Started

Download

You can download it here or type this in your command-line:

$ git clone https://github.com/Diheng/MTData.git

Installation

Create a virtual environment with python 2.7.12+, and install dependencies (see http://docs.python-guide.org/en/latest/dev/virtualenvs/)

    $ virtualenv venv           
    $ source venv/bin/activate
    $ pip install requirements.txt
    $ python setup.py install
    $ cd config
    $ cp server.config.example server.config
    $ cp log.config.example log.config

Note: For documentation of virtualenv, please see this guide; for help in setting up the environment in Windows OS, please see [this blog}(http://timmyreilly.azurewebsites.net/python-pip-virtualenv-installation-on-windows/). For Windows users, you will need to install Git as well.

Configuration

Create needed config and keys files. They should be place in folders within MTData, like this:

MTData/
├── MTData           <- actual codes
│   ├── com.py
│   ├── export.py
│   ├── export_old.py
│   ├── helloworld.py
│   ├── recovery.py
│   └── scales.py
├── README.md
├── bin
│   └── martin.sh
├── config           <- configuration files. In actual phrase, use \*.config instead of \*.config.sample
│   ├── export.config
│   ├── log.config
│   ├── recovery.config
│   ├── recovery_log.config
│   └── server.config
├── docs
├── keys             <- Keys for decrypting
│   ├── private_2.pem
│   └── private_1.pem
├── requirements.txt
├── setup.py
└── tests

Edit the server.config, log.config and recover_log.config.

Warning Failed to setup the config files will probably lead to unexpected errors. For example, MTData might return a very confusing error “export is not a MTData command, do you mean?…” It will disappear after you correctly config all three config files.

Here is an example of server.config with comments:

# create a new block for each new study you launch. Assign a name to it.
name_of_server1:

  # READY variables tells MTData whether it should export data for this study. Change it to True when you are ready.
  READY: False

  # DELETE_MODE tells MTData if it should delete deleteable entries on server.
  DELETE_MODE: False

  # SERVER: where you host your study. Remember to add '/api/exort' at the end of url.
  SERVER: 'https://MindTrails.virginia.edu/api/export'

  # Put in the account information of an admin account.
  USER:
  PASS:

  # Name of the key files for decrypting, you should have the actual files in the MTData/keys folder.
  PRIVATE_FILE:

  # This is for the time stamp on output csv files. You don't need to change it.
  DATE_FORMAT: "%b_%d_%Y"
  TIME_FORMAT: "%H_%M_%S"

  # Absolute path for the folder where you want to store your exported data. You should make a separated folder for each study.
  PATH: "/Users/X/Data_pool/name_of_server1/"

name_of_server2:
  READY: True
  DELETE_MODE: False
  SERVER: 'http://localhost:9000/api/export'
  USER:
  PASS:
  PRIVATE_FILE: 'key_for_decrypt.pem'
  DATE_FORMAT: "%b_%d_%Y"
  TIME_FORMAT: "%H_%M_%S"
  PATH: "/Users/Diheng/Box Sync/TEST_Diheng/"

Note for ‘READY’ You can override READY:False in report and decode by specifying the name of server, but you CANNOT export a server’s data if READY is False at any time.

Note for ‘deleteable’ In MindTrails, all tables have a ‘deleteable’ attribute. ‘deleteable’ is True when this table contains sensitive data that you don’t want to keep on your front end server, and therefore requires to be downloaded and deleted frequently (like, every 5 minutes). ‘deleteable’ is False when this table is needed for the online study constantly (like, baseline score for alarming, task logs needed for reference).

Here is an example of log.config with comments:


Not yet done, please see log.config.sample for now.

Structure your data folders

Create folders for your studies. They should looks like this:

Data_pool
├── logs    <- All website share one log folder
├── name_of_server1  
│   ├── active_data     <- for the csv files we decoded from raw_data
│   ├── raw_data        <- for the json files we saved from export. BenchMark file sits here as well.
│   └── reports         <- for the data checking report we generated based on active_data
├── name_of_server2
│   ├── active_data
│   ├── raw_data
│   └── reports
└── name_of_server3
    ├── active_data
    ├── raw_data
    └── reports

Setup routine

Once you have done the installation and configuration, you can now write your own bash file and set up your own data managing routine: First you write a download.sh:

#!/bin/bash
# download all deleteable data from all server.
MTData export . .

Then you edit your crontab by:

$ crontab -e

Add a line to crontab:

*/5 * * * * /Path/to/your/download.sh

Similarly you can create routine to do the needfuls.

Current Usage

export

$ MTData export [serverName, default=.(All)] [scaleName =./static/All]

decode

$ MTData decode [serverName, default=.(All)] [scaleName, default=.(All)]

report

$ MTData report client [serverName, default=.(All)]
$ MTData report scale [serverName, default=.(All)]

tools

$ MTData status OA  filepath
$ MTData status all serverName
$ MTData clean OA filepath
$ MTData clean all severname
$ MTData scores OA filepath
$ MTData scores all serverName
$ MTData LongToWide OA filepath
$ MTData LongToWide all serverName

TODO

  1. Finish all the basic functions(export, decode, report)
  • export <- Done.
  • decode <- Done.
  • report <- Done, not yet tested*
  1. Make the functions to be commend line tools and test them. <- Done.
  2. Make the code more concise by re-using (instead of copy&paste) methods. <- not yet done.
  3. Redesign the logging system. <- not yet done.
  4. Deploy to server and test it in commend line(currently, the .py files are called with python.) <- not yet done.
  5. Update documentation. <- Done.

Extra: Toolbox for data analysis We could add small tools that make our data analysis less boring and a lot faster. For example, almost all the questionnaire need to be scored and transform, so we have a scale.py that has the definition for the common actions shared with every scale. Each scale could have there own definition of action as well.

Example:

  • tools.py <- Tools that could be used in different situation.
  • Scale.py <- has the function of score and trans, I used it for scoring. Need to be extended*

Also, we could write function that do basic analysis that we would need for time to time. For example, we would need to generate a attrition rate report pretty often. Diheng has python codes that works with pandas and Sam probably has tons of R code as well, which could be turn into python small tools pretty easily.*

Let Diheng or Sam know if you would like to work on getting the most frequently used analysis codes into small tools.

All the items that end with a * would apprecitate helps!

What is done

  • Read and write data
  • Error alert
  • Error logs
  • Normal running logs
  • Added where to skip Error
  • Save the raw data
  • Deleting the raw data
  • Write up the bash code to automatically run export.py regulary.
  • Write a Recovery program to recover data from raw data files.

Note to myself: LOG_CFG=my_logging.yaml python my_server.py

Back to the main manu