Understanding the format of exported data
New option to export to .xlsx
In SurveyCTO v2.70.9, we added the option to export form data directly to .xlsx format. If you use Excel, you should always export to .xlsx format to avoid issues with importing .csv files.
SurveyCTO defaults to exporting data in .csv files – in a comma-separated-value format supported by just about any spreadsheet, database, or statistical analysis software – but also offers .xlsx and .sav export formats for Excel and SPSS users, respectively. For the main data file exported for each of your forms, the first row (also called the "header row") contains field or column names, and each additional row contains data for a specific submission (also called a "filled-out form" or "instance"). Every file includes columns that correspond to the fields in your form, plus the following extra columns:
KEY: a unique identifier for each submission, automatically assigned by SurveyCTO. You can use this key to uniquely identify and track each submission through your entire data pipeline.
instanceID: a duplicate copy of the submission's key (can be ignored).
formdef_version: the version number of the form definition used to fill out the submission. Use this column to make sense of data when your form has changed over time (for more on updating forms, see Updating an existing form).
SubmissionDate: the date and time that the submission was submitted to the SurveyCTO server. This may well differ from the date and time at which the form was initially filled out.
review_status (maybe): if the review and correction workflow is enabled for the form and you have chosen to export not-yet-approved submissions, then this column indicates the review status for each submission.
review_comments (maybe): if the review and correction workflow is enabled for the form, then this column includes any comments added to each submission.
review_corrections (maybe): if the review and correction workflow is enabled for the form, then this column includes a history of corrections made to each submission. (Whenever your exported data includes corrections, your export will also include a separate yourformid_correction_log.csv file that contains a complete record of corrections.)
review_quality (maybe): if the review and correction workflow is enabled for the form, then this column includes the quality classification for all reviewed submissions.
Except for some special cases described below, each field in your survey will be exported as its own separate column. The name of each column (the value in the header row of exported files) will simply be the name that you gave the field in your form definition – unless you are exporting data with SurveyCTO Desktop and have selected the option to include group names in column headers. In that case, column headers will include both field names and enclosing groups (e.g., consented-agriculture-crops for a field named crops that's inside groups named consented and agriculture).
Dates and times (like the SubmissionDate column discussed above) are always exported relative to the time zone of the exporting computer – not the time zone of the data-collection device. This is done as a convenience so that data collected in multiple time zones can be easily analyzed relative to a single fixed time zone. (In the case of data published directly from the server to the cloud, the time zone used is always UTC.)
GPS locations captured with geopoint fields are exported as four columns rather than just one; for example, for a field named gpsloc, the four columns would be: gpsloc-Latitude, gpsloc-Longitude, gpsloc-Altitude, and gpsloc-Accuracy (altitude and accuracy will be in meters – and set to exactly 0.0 when unavailable). GPS data collected with geoshape or geotrace fields export into single columns, each of which includes a list of points that represent the polygon or polyline saved; the list of points is separated by semi-colons, and each point has a latitude, a space, a longitude, a space, an altitude, a space, and an accuracy.
Multiple-choice fields that allow the user to choose more than one option (captured with select_multiple fields) are exported as a space-separated list of selected option values (like "1 3" if only options with values 1 and 3 were selected). By default, a series of 1/0 columns for each possible answer value (i.e., a series of "dummy variables") will also be added; for example, a single field named crops that has choice values 1, 2, and 3 would be exported with extra crops_1, crops_2, and crops_3 columns, with submission values of 1 if selected or 0 if not selected. If you don't need the extra columns and want to speed up your exports, you can disable the Export select_multiple responses as a series of 1/0 columns setting in SurveyCTO Desktop.
Fields inside repeat groups can have multiple values for a single submission, so they require the exported data to be structured a bit differently. Repeated data can be exported in either "long" or "wide" format:
Wide: in wide format, additional columns are simply added to the primary export spreadsheet in order to accommodate repeated data. For example, if a household roster included a repeated field named age, then the exported data file would include columns age_1, age_2, and so on; if the maximum number of household members in the data was 21, then there would be 21 age columns and 21 columns for every other field in the roster's repeat group. The server console's Export tab defaults to exporting in this format, and there is an option in SurveyCTO Desktop to include wide-format exports as well.
Long: in long format, a separate row is added for each repeat instance. Since the main export spreadsheet is structured to have only one row per submission, each repeat group exports into a separate data spreadsheet – so a form with repeat groups will export as a primary data spreadsheet plus additional data spreadsheet for each repeat group. Each row in a repeat group's data spreadsheet will have its own unique KEY value, plus a PARENT_KEY value that can be used to link to the primary submission or parent-group data spreadsheet. When exporting to .csv and .sav, each repeat group will be exported as its own file. When exporting to .xlsx, each repeat group will be exported to its own sheet within the workbook. The main form data will be located in the first sheet in that workbook, which will be named "data".
Exported .csv files are encoded in Unicode format (UTF-8) so that they can support the widest possible range of characters and scripts. Unfortunately, Microsoft Excel defaults to importing .csv files as Latin text, so some accents or other scripts may be distorted. If you use Excel, your best option is to always export form data in .xlsx format. See Looking at data in Microsoft Excel for details on how to safely import your data into Excel.
Finally, every SurveyCTO export includes a data export report to help consumers of that data contextualize it. See Data export reports for more on those reports.