Convert More Data
Introduction
Once you have made it through a full conversion workflow and published your data in ResearchSpace, what happens when you have more data to add? You have a few options depending on what is new about the new data.
Conversion Options
Edit your Data in ResearchSpace
Now that your data is in ResearchSpace, your team has access to ResearchSpace’s data editing and data creation features as explained in the Setup in ResearchSpace step. If your new data is a relatively small number of new entities or connections between them, then editing directly in ResearchSpace is a good option.
When you edit your data directly in ResearchSpace, it creates a new version that diverges from your original unconverted source data. If you want those changes to apply to versions of your data outside of ResearchSpace, you will need to apply the same changes there.
If you plan on continuing to make significant changes to your original data and want those changes to appear in ResearchSpace, reach out to LINCS to discuss options.
Rerun a Conversion Workflow on New Data
This is the case where you have a new batch of data that follows the same structure and contains the same relationships as your originally converted batch.
- Structured Data
- Semi-Structured Data
- TEI Data
- Natural Language Data
An example here would be having new rows for the spreadsheet you converted originally.
Here is how each step will need to change and be repeated:
- Export Data
- Repeat the same process.
- Clean Data
- If you used a script, then run it on the new data.
- If you made manual changes, you need to apply those to the new data. Note that if you used OpenRefine, you may be able to open the original project and export the change history from the Undo/Redo tab.
- Reconcile Entities
- Any entities that did not appear in the first batch needs to be reconciled externally.
- Any entities that appear in the first batch and the new batch need to be reconciled against one anothe (i.e., use the same identifier for the same entity in both batches).
- Develop Conceptual Mapping
- Because the structure of your data has not changed, you can reuse the same conceptual mapping from your original Develop Conceptual Mapping step.
- Implement Conceptual Mapping
- The script or template you used to implemented that conceptual mapping will need to be rerun.
- If you used 3M, then you only need to replace the input file for your 3M mapping project with your new data and hit run in 3M.
- Validate and Enhance
- Again, either the script you used or the manual changes you made will need to be repeated.
The repeatability of this workflow varies because this workflow looks different for every project.
If you know you will continue to have new data of the same format, then ideally you will setup a repeatable set of steps that use automated scripts or templates. If that is not possible, then you will have to repeat the more manual process for each new batch of data.
This process is likely to take the same amount time as the original conversion.
New TEI data containing the same relationships can follow the same LINCS XTriples workflow as the first batch. Similar to the Structured Data Workflow, the rest of the steps will need to be redone—either by rerunning scripts, reusing the same tools, or redoing manual work.
This process is likely to take the same amount time as the original conversion.
New natural language data containing the same relationships can follow the same workflow as the first batch. Similar to the Structured Data Workflow, the rest of the steps will need to be redone—either by rerunning scripts, reusing the same tools, or redoing manual work.
For new data, you still need to reconcile against external sources, but now you also need to reconcile against your already converted data.
Run a New Conversion Workflow on New Data
If you have new data that does not have the same starting structure as the data you originally converted, or if it contains many new relationships, then you will need to repeat the appropriate conversion workflow on the new data. You may be able to use an edited version of your original conversion workflow if there are similarities between the batches of data.
Publication Options
Your newly converted data can be combined with data you already have in ResearchSpace so that it appears as a single project and named graph in the LINCS triplestore.
Alternatively, if it covers new subject matter and is part of a different research project, it can be published as a separate project in ResearchSpace and be stored in a different named graph.