Submission Index
The submission index is the main Lucene index used for BioStudies submission search and retrieval.
It stores the indexed representation of submission-level metadata from the public collection in
the registry + the collection-specific ones, along with a few filesystem-related fields used by the
indexing pipeline.
Main fields
| Field Name | Type | Stored | Indexed | Description | Notes |
|---|---|---|---|---|---|
access |
tokenized_string |
No | Yes | Access-related metadata used for filtering and access control. | Uses a custom parser and analyzer. Values are normalized to lowercase. |
accession |
tokenized_string |
Yes | Yes | Submission accession identifier. | Retrieved as a core identifier field. |
type |
untokenized_string |
Yes | Yes | Submission type. | Stored for retrieval and used in filtering. |
title |
tokenized_string |
Yes | Yes | Submission title. | Sorted and analyzed as full text. |
author |
tokenized_string |
Yes | Yes | Author names extracted from the submission. | Multi-value field parsed from structured content. |
content |
tokenized_string |
No | Yes | Full-text content assembled from submission sections, files, and links. | Main full-text search field. |
links |
long |
Yes | Yes | Count of link entries associated with the submission. | Numeric field used for sorting and retrieval. |
files |
long |
Yes | Yes | Count of file entries associated with the submission. | Numeric field used for sorting and retrieval. |
release_date |
untokenized_string |
Yes | Yes | Submission release date. | Stored as a retrievable field. |
ctime |
long |
No | Yes | Submission creation time. | Numeric timestamp field. |
mtime |
long |
No | Yes | Submission modification time. | Numeric timestamp field. |
relPath |
untokenized_string |
No | Yes | Relative path of the indexed submission source. | Used as a filesystem-oriented metadata field. |
storageMode |
untokenized_string |
No | Yes | Storage mode for the submission. | Defaults to NFS when not provided. |
Notes
- The fields above are driven by the
publiccollection definition in the collections registry, as well as the collection-specific fields defined in the collection-specific index mappings. - Some fields are derived by parsers rather than taken directly from a single JSON path.
contentis the main searchable text field.accessuses access-oriented parsing and analyzer behavior.- Numeric fields such as
links,files,ctime, andmtimeare used for sorting and metadata handling rather than free-text search.