Skip to content

File Index

The file index stores Lucene documents for files associated with BioStudies submissions.

It is populated from file metadata and normalized by the file document factory. The index contains both core file fields and dynamic attribute fields discovered during indexing.

Main fields

Field Name Type Stored? Indexed Description Notes
id string Yes Yes Unique document identifier for the file entry. Derived from file identity and position.
file_position long Yes No Position of the file within the submission. Used for ordering.
file_path string Yes Yes Full path of the file. Uses filePath when available, otherwise falls back to relPath.
file_name string Yes Yes File name. Derived from fileName or from the last path segment. Indexed in normalized and exact form.
file_size long Yes Yes File size, usually in bytes. Defaults to 0 when missing.
file_section string Yes Yes Section accession the file belongs to. Present for non-study root sections only.
file_type string Yes Yes File entry type. Standard value is file.
file_isDirectory string Yes Yes Indicates whether the entry is a directory. Stored as true or false.
file_owner string Yes Yes Accession of the submission that owns the file. Links the file document back to the parent submission.

Dynamic fields

The file index also stores dynamic attribute fields discovered in file metadata.

These fields:

  • are indexed and stored
  • are normalized to lowercase for indexed values
  • are added only when the attribute name/value pair is valid and not duplicated

Examples include file-level metadata such as description, format, or other submission-specific attributes.

Notes

  • The file index is built from file-level JSON metadata.
  • Core fields are always created by the file document factory.
  • Dynamic attributes extend the index per submission, depending on the file content.
  • Some attributes may be excluded for specific submission types to avoid conflicting metadata.