Skip to content

Submission Index

The submission index is the main Lucene index used for BioStudies submission search and retrieval.

It stores the indexed representation of submission-level metadata from the public collection in the registry + the collection-specific ones, along with a few filesystem-related fields used by the indexing pipeline.

Main fields

Field Name Type Stored Indexed Description Notes
access tokenized_string No Yes Access-related metadata used for filtering and access control. Uses a custom parser and analyzer. Values are normalized to lowercase.
accession tokenized_string Yes Yes Submission accession identifier. Retrieved as a core identifier field.
type untokenized_string Yes Yes Submission type. Stored for retrieval and used in filtering.
title tokenized_string Yes Yes Submission title. Sorted and analyzed as full text.
author tokenized_string Yes Yes Author names extracted from the submission. Multi-value field parsed from structured content.
content tokenized_string No Yes Full-text content assembled from submission sections, files, and links. Main full-text search field.
links long Yes Yes Count of link entries associated with the submission. Numeric field used for sorting and retrieval.
files long Yes Yes Count of file entries associated with the submission. Numeric field used for sorting and retrieval.
release_date untokenized_string Yes Yes Submission release date. Stored as a retrievable field.
ctime long No Yes Submission creation time. Numeric timestamp field.
mtime long No Yes Submission modification time. Numeric timestamp field.
relPath untokenized_string No Yes Relative path of the indexed submission source. Used as a filesystem-oriented metadata field.
storageMode untokenized_string No Yes Storage mode for the submission. Defaults to NFS when not provided.

Notes

  • The fields above are driven by the public collection definition in the collections registry, as well as the collection-specific fields defined in the collection-specific index mappings.
  • Some fields are derived by parsers rather than taken directly from a single JSON path.
  • content is the main searchable text field.
  • access uses access-oriented parsing and analyzer behavior.
  • Numeric fields such as links, files, ctime, and mtime are used for sorting and metadata handling rather than free-text search.