This page was generated from examples/wehr/wehr-nick.ipynb. Interactive online version: Binder badge.

Sattler#

Dataset: https://www.dropbox.com/sh/4adrgjsee60vcvj/AADJ-hbes1uHg3FE0et69sy5a?dl=1

Say we have these files…

[1]:
# first handle imports..
from pathlib import Path
from pprint import pprint

from onice_conversion import NWBConverter
from onice_conversion import spec
[2]:
# we've symlinked the example data folder to the cwd for this example
base_path = Path().cwd()  / '2021-02-26_17-19-10_mouse-0232'

data_files = [str(path.relative_to(base_path)) for path in base_path.glob("**/*")]
pprint(sorted(data_files))
['.DS_Store',
 '2021-02-26_17-19-12_mouse-0232',
 '2021-02-26_17-19-12_mouse-0232/103_ADC1.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC2.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC3.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC4.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC5.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC6.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC7.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_ADC8.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_AUX1.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_AUX2.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_AUX3.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH1.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH10.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH11.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH12.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH13.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH14.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH15.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH16.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH17.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH18.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH19.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH2.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH20.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH21.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH22.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH23.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH24.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH25.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH26.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH27.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH28.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH29.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH3.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH30.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH31.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH32.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH4.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH5.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH6.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH7.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH8.continuous',
 '2021-02-26_17-19-12_mouse-0232/103_CH9.continuous',
 '2021-02-26_17-19-12_mouse-0232/Continuous_Data.openephys',
 '2021-02-26_17-19-12_mouse-0232/TT0.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT1.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT2.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT3.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT4.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT5.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT6.spikes',
 '2021-02-26_17-19-12_mouse-0232/TT7.spikes',
 '2021-02-26_17-19-12_mouse-0232/all_channels.events',
 '2021-02-26_17-19-12_mouse-0232/messages.events',
 '2021-02-26_17-19-12_mouse-0232/messages_bak.events',
 '2021-02-26_17-19-12_mouse-0232/notebook.mat',
 '2021-02-26_17-19-12_mouse-0232/settings.xml',
 '2021-02-26_17-19-12_mouse-0232/stimlog.txt',
 'Sky_mouse-0232_2021-02-26T17_19_10.csv',
 'Sky_mouse-0232_2021-02-26T17_19_10.mp4',
 'TTL_mouse-0232_2021-02-26T17_19_10.csv']

Which compose a dataset of

  • Continuous extracellular ephys data recorded by open ephys

  • Spikes sorted by Kilosort

  • Stimulus information from some custom behavioral software

  • Raw video of the behaving animal.

Different parts of the metadata are

  • Encoded in the file paths

  • embedded in a .mat file

  • and a .txt file

  • and a .csv file

We’ll use our fancy new tools in three steps:

  1. Add metadata with NWBConverter.add_metadata

  2. Add nwb-conversion-tools interfaces to common data formats with .add_interface

  3. Add base pynwb container types with .add_container

The first step is to create our converter object, which will store the abstract representation of our data format and handle the conversion to NWB:

[3]:
converter = NWBConverter(base_path)

Add Metadata!#

The first step is to add general file-level metadata about the experiment, the researcher, etc. We can see what fields are available/expected from NWB by default with our converter!

It’s a little verbose, so for the purpose of keeping this notebook readable we’ll just print the names of the ‘NWBFile’ metadata container

[4]:
sorted([field['name'] for field in converter.base_nwb_metadata['NWBFile']])
[4]:
['data_collection',
 'electrodes',
 'epoch_tags',
 'epochs',
 'experiment_description',
 'experimenter',
 'file_create_date',
 'identifier',
 'institution',
 'invalid_times',
 'keywords',
 'lab',
 'notes',
 'pharmacology',
 'protocol',
 'related_publications',
 'session_description',
 'session_id',
 'session_start_time',
 'slices',
 'source_script',
 'source_script_file_name',
 'stimulus_notes',
 'subject',
 'surgery',
 'sweep_table',
 'timestamps_reference_time',
 'trials',
 'units',
 'virus']

Static Metadata#

The simplest metadata is static metadata that you don’t expect to change across all instances of this data format. We can call add_metadata with a dictionary of static metadata, in this case nested within the 'NWBFile' container.

[5]:
converter.add_metadata({
    'NWBFile': {
        'institution': "University of Oregon",
        'lab': 'Wehr'
    }
})

Metadata from paths - the spec module#

This package relies heavily on its .spec module, which gives us tools to express where data is stored in different forms.

One common pattern is to specify some metadata in file and directory names. In this case the subject ID is encoded in several of the paths. We will use that to start adding metadata for the other default container in nwb, 'Subject' which has field names:

[6]:
sorted([field['name'] for field in converter.base_nwb_metadata['Subject']])
[6]:
['age',
 'date_of_birth',
 'description',
 'genotype',
 'sex',
 'species',
 'subject_id',
 'weight']

Let’s use this filename (it doesn’t matter which, as long as it will be present in all datasets you’re applying this converter to):

Sky_mouse-0232_2021-02-26T17_19_10.csv

The subject id 0232 is embedded, and lucky for us so is the experiment start time! We can specify that to the converter like this:

[7]:
our_first_spec = spec.Path(
    'Sky_mouse-{Subject[subject_id]}_{NWBFile[session_start_time]}.csv'
)

Note how we replaced the parts of the string we want to parse out with {bracketed} terms – these define what to call the variables we extract. We can give nested names (ie. to conform to the container structure of NWB files) using [] square brackets.

We can preview what the output of our spec object will look like by calling its parse method with the directory to look in:

[8]:
our_first_spec.parse(base_path)
[8]:
{'Subject': {'subject_id': '0232'},
 'NWBFile': {'session_start_time': '2021-02-26T17_19_10'}}

Metadata in Files#

Another common pattern is to store metadata in one or several structured files, like .json, .csv, .mat, and so on. No prob. A lot of our metadata in this case is located in the notebook.mat file.

We can use one of our helper functions to preview what’s in it:

[9]:
mat_meta = spec.external_file.load_clean_mat(
    list(base_path.glob('**/notebook.mat'))[0]
)
mat_meta['nb']

[9]:
{'user': 'Molly',
 'mouseID': '0232',
 'Depth': 'unknown',
 'datapath': 'Z:\\lab\\djmaus\\Data\\Molly',
 'activedir': '\\\\wehrrig4\\d\\lab\\djmaus\\Data\\Molly\\2021-02-26_17-19-10_mouse-0232\\2021-02-26_17-19-12_mouse-0232',
 'LaserPower': 'unknown',
 'mouseDOB': 'age unknown',
 'mouseSex': 'sex unknown',
 'mouseGenotype': 'genotype unknown',
 'Drugs': 'none',
 'notes': array([], dtype='<U1'),
 'Reinforcement': 'none'}

We can add metadata from the file using the Mat object, which in this case needs us to specify the key separately. Since we don’t really care about the rest of the path, it might change, and there should only be one notebook, we can just glob away the rest of the path as well

Say for example, we want to get the experimenter’s name

[10]:

mat_spec = spec.Mat( path='**/notebook.mat', # 2 **s mean we can glob recursively key="user", # hold up on the nested ones for this, field = ('nb', 'user') ) mat_spec.parse(base_path)
[10]:
{'user': 'Molly'}

Add Interfaces!#

We have some open ephys data here! It’s described by the

[11]:
converter.add_interface('recording', 'open_ephys')
Source Schema for ABCMeta
-------------------------
{'additionalProperties': True,
 'properties': {'folder_path': {'description': 'Path to directory containing '
                                               'OpenEphys files.',
                                'format': 'directory',
                                'type': 'string'}},
 'required': ['folder_path'],
 'type': 'object'}
-------------------------
[12]:
converter.add_interface(
    'recording', 'open_ephys',
    spec.Glob(
        key="folder_path",
        format="*mouse*",
        only_dirs=True
    )
)

Run the conversion!!#

[13]:
# converter.run_conversion(nwbfile_path='nwbfile.nwb')