About MindTrails Data ====== *[Back to the main manu](index.html)* A command-line tool that handle data exporting, decrypting and basic checking for mindtrails or mindtrails-like website. Support multiple websites data collecting. It also contains toolbox for data analysis. Environment Note: For Mac Users: Please update Xcode version to ensure successful update. Basic idea --------- The main idea of MTData is three-fold: 1. ```export``` encrypted data as json files 2. ```decode``` json files locally into csv files 3. ```report``` basic missing data measurements based on the decrypted csv files At the end of the day you will have raw backup json data and ready-to-use csv format data for your analysis. You also can refer to automatically generated reports and logs for data integrity issues. Examples --------- ```sh # download and delete all the deleteable questionnaire entries on multiple servers $ MTData export . . # or $ MTData export # download all the questionnaire entries that should not be deleted from the templeton server. $ MTData export templeton static # decode all the questionnaire for mindtrails server. $ MTData decode mindtrails . # Generate data checking tables that calculate the percentage of missing data for each column by questionnaire, for all. $ MTData report scale # Generate data checking tables that calculate the percentage of missing questionnaire for each participant in mindtrails project(servers) $ MTData report client mindtrails ``` You could also create simple bash script with these tools to setup their export, decode and report schedule. Getting Started ============ Download --------- You can download it [here](https://github.com/Diheng/MTData) or type this in your command-line: ```sh $ git clone https://github.com/Diheng/MTData.git ``` Installation --------- Create a virtual environment with python 2.7.12+, and install dependencies (see http://docs.python-guide.org/en/latest/dev/virtualenvs/) ```bash $ virtualenv venv $ source venv/bin/activate $ pip install requirements.txt $ python setup.py install $ cd config $ cp server.config.example server.config $ cp log.config.example log.config ``` *Note:* For documentation of virtualenv, please see *[this guide](http://python-guide-pt-br.readthedocs.io/en/latest/dev/virtualenvs/)*; for help in setting up the environment in Windows OS, please see *[this blog}(http://timmyreilly.azurewebsites.net/python-pip-virtualenv-installation-on-windows/)*. For Windows users, you will need to install Git as well. Configuration --------- Create needed config and keys files. They should be place in folders within MTData, like this: ``` MTData/ ├── MTData <- actual codes │   ├── com.py │   ├── export.py │   ├── export_old.py │   ├── helloworld.py │   ├── recovery.py │   └── scales.py ├── README.md ├── bin │   └── martin.sh ├── config <- configuration files. In actual phrase, use \*.config instead of \*.config.sample │   ├── export.config │   ├── log.config │   ├── recovery.config │   ├── recovery_log.config │   └── server.config ├── docs ├── keys <- Keys for decrypting │   ├── private_2.pem │   └── private_1.pem ├── requirements.txt ├── setup.py └── tests ``` Edit the server.config, log.config and recover_log.config. *Warning* Failed to setup the config files will probably lead to unexpected errors. For example, MTData might return a very confusing error "export is not a MTData command, do you mean?..." It will disappear after you correctly config all three config files. Here is an example of server.config with comments: ```yaml # create a new block for each new study you launch. Assign a name to it. name_of_server1: # READY variables tells MTData whether it should export data for this study. Change it to True when you are ready. READY: False # DELETE_MODE tells MTData if it should delete deleteable entries on server. DELETE_MODE: False # SERVER: where you host your study. Remember to add '/api/exort' at the end of url. SERVER: 'https://MindTrails.virginia.edu/api/export' # Put in the account information of an admin account. USER: PASS: # Name of the key files for decrypting, you should have the actual files in the MTData/keys folder. PRIVATE_FILE: # This is for the time stamp on output csv files. You don't need to change it. DATE_FORMAT: "%b_%d_%Y" TIME_FORMAT: "%H_%M_%S" # Absolute path for the folder where you want to store your exported data. You should make a separated folder for each study. PATH: "/Users/X/Data_pool/name_of_server1/" name_of_server2: READY: True DELETE_MODE: False SERVER: 'http://localhost:9000/api/export' USER: PASS: PRIVATE_FILE: 'key_for_decrypt.pem' DATE_FORMAT: "%b_%d_%Y" TIME_FORMAT: "%H_%M_%S" PATH: "/Users/Diheng/Box Sync/TEST_Diheng/" ``` **Note for 'READY'** You can override READY:False in ```report``` and ```decode``` by specifying the name of server, but you *CANNOT* ```export``` a server's data if READY is False at any time. **Note for 'deleteable'** In MindTrails, all tables have a 'deleteable' attribute. 'deleteable' is True when this table contains sensitive data that you don't want to keep on your front end server, and therefore requires to be downloaded and deleted frequently (like, every 5 minutes). 'deleteable' is False when this table is needed for the online study constantly (like, baseline score for alarming, task logs needed for reference). Here is an example of log.config with comments: ```yaml ``` Not yet done, please see log.config.sample for now. Structure your data folders ---------- Create folders for your studies. They should looks like this: ``` Data_pool ├── logs <- All website share one log folder ├── name_of_server1 │   ├── active_data <- for the csv files we decoded from raw_data │   ├── raw_data <- for the json files we saved from export. BenchMark file sits here as well. │   └── reports <- for the data checking report we generated based on active_data ├── name_of_server2 │   ├── active_data │   ├── raw_data │   └── reports └── name_of_server3 ├── active_data ├── raw_data └── reports ``` Setup routine ---------- Once you have done the installation and configuration, you can now write your own bash file and set up your own data managing routine: First you write a **download.sh**: ```sh #!/bin/bash # download all deleteable data from all server. MTData export . . ``` Then you edit your **crontab** by: ```sh $ crontab -e ``` Add a line to crontab: ```sh */5 * * * * /Path/to/your/download.sh ``` Similarly you can create routine to do the needfuls. Current Usage =========== export --------- ```sh $ MTData export [serverName, default=.(All)] [scaleName =./static/All] ``` decode -------- ```sh $ MTData decode [serverName, default=.(All)] [scaleName, default=.(All)] ``` report ---------- ```sh $ MTData report client [serverName, default=.(All)] $ MTData report scale [serverName, default=.(All)] ``` tools --------- ```sh $ MTData status OA filepath $ MTData status all serverName $ MTData clean OA filepath $ MTData clean all severname $ MTData scores OA filepath $ MTData scores all serverName $ MTData LongToWide OA filepath $ MTData LongToWide all serverName ``` TODO ========= 1. Finish all the basic functions(export, decode, report) - export <- Done. - decode <- Done. - report <- Done, not yet tested* 2. Make the functions to be commend line tools and test them. <- Done. 3. Make the code more concise by re-using (instead of copy&paste) methods. <- not yet done. 3. Redesign the logging system. <- not yet done. 3. Deploy to server and test it in commend line(currently, the .py files are called with python.) <- not yet done. 4. Update documentation. <- Done. *Extra: Toolbox for data analysis* We could add small tools that make our data analysis less boring and a lot faster. For example, almost all the questionnaire need to be scored and transform, so we have a scale.py that has the definition for the common actions shared with every scale. Each scale could have there own definition of action as well. **Example:** - tools.py <- Tools that could be used in different situation. - Scale.py <- has the function of score and trans, I used it for scoring. Need to be extended* Also, we could write function that do basic analysis that we would need for time to time. For example, we would need to generate a attrition rate report pretty often. Diheng has python codes that works with pandas and Sam probably has tons of R code as well, which could be turn into python small tools pretty easily.* **Let Diheng or Sam know if you would like to work on getting the most frequently used analysis codes into small tools.** *All the items that end with a * would apprecitate helps!* What is done ========= * Read and write data * Error alert * Error logs * Normal running logs * Added where to skip Error * Save the raw data * Deleting the raw data * Write up the bash code to automatically run export.py regulary. * Write a Recovery program to recover data from raw data files. Note to myself: LOG_CFG=my_logging.yaml python my_server.py *[Back to the main manu](index.html)*