What is liquid-liquid phase separation (LLPS)?
Liquid-liquid phase separation (LLPS) (also known as liquid demixing)
processes underlie the formation of diverse membraneless organelles, such as stress granules,
P-bodies, the nucleolus and postsynaptic densities, just to name a few (PMID:28935776). These newly
discovered and very intensively researched biological condensates represent functional and
structural units of cellular organization with an ever increasing number of cellular functions
turning out to heavily rely on them, including transcriptional regulation and silencing
(PMID:29930091 PMID:28636604) and the signal transduction networks of membrane receptors
(PMID:22398450 PMID:27056844). Their formation through LLPS is typically driven by multivalent weak
interactions of LLPS driver regions. LLPS has emerged as a general mechanism to organize the
intracellular space, exploited not only by eukaryotic cells, but also by bacteria and viruses
(PMID:30197298 PMID:28680096). Mutations affecting LLPS driver regions are implicated in diverse
neurological disorders, such as Amyotrophic lateral sclerosis (ALS) and Frontotemporal dementia
(FTD) (PMID:31188823), cancers, and muscular atrophies (PMID:28817800 PMID:26481498).
The ability to drive liquid-liquid phase separation (LLPS) is encoded in protein sequences, but it
can be achieved by diverse architectures, including low sequence complexity disordered regions,
multivalent domain – motif interactions, RNA binding domains, oligomerization domains and a
combinations of these modules (PMID:30099026). Therefore, the identification of such proteins is
difficult. In the last years liquid-liquid phase separation of proteins leading to the formation of
membraneless organelles became an absolutely prominent topic in molecular-structural cell biology,
resulting in an avalanche of high-impact publications. PhaSePro is the first comprehensive database
serving as the central resource for proteins mediating LLPS.
What information is contained in PhaSePro?
PhaSePro stores over one hundred proteins that are able to drive LLPS, all
carefully curated and supported by literature references. For all of these proteins, PhaSePro
provides:
- cross-reference to UniProt entries, including accessions, gene/protein names and source
species
- the membraneless organelle(s) (MO) formed, as defined in the Gene Ontology
- if the protein is part of a multi-component LLPS system, the definition of other proteins needed
for the
MO formation
- the sequence boundaries and characteristics of the experimentally validated LLPS driver
region(s)
- a description of the functional relevance of the given LLPS system and related MO, the
functional class of the MO
- the types of molecular interactions involved in LLPS
- regulatory mechanisms of LLPS: post-translational modifications and alternative splicing events
known to influence LLPS, all UniProt isoforms containing sequence changes in the annotated LLPS
regions
- disease mutations confirmed to affect LLPS by experiments
- literature references from PubMed
In addition, PhaSePro provides a description of the experiment in which LLPS behavior was demonstrated,
using the Evidence and Conclusion Ontology (ECO), along with useful lists on the molecular partners and
other determinants that affect/regulate LLPS. Last but not least, PhaSePro applies the recently released
smart visualization tool of UniProt, ProtVista (PMID:28334231), extended with our annotated LLPS
regions, IUPred disorder predictions (PMID:29860432), PFAM annotations and PTMs from PhosphoSitePlus. An
overview of the PDB structures of the corresponding
LLPS protein regions, if available, is also provided on the entry pages.
Server functionality
Browsing/Searching
- Entries can be ordered and filtered based on name, organelle type and organism
- Entries can be filtered based on the molecular determinants of LLPS, such as RNA dependency and
the
involvement of domain-motif interactions
- A dedicated search field can be used to search for full or partial common/UniProt names, or
UniProt
accessions
Entry pages
For each entry the dedicated entry page details information relevant to the
protein’s involvement in LLPS, including:
- Basic data about the protein, including UniProt cross-reference, gene/protein names and source
organism
- Basic data about the LLPS, including the region(s) of the protein shown to mediate LLPS
including
literature reference, and the type of membraneless organelle formed, linked to Gene Ontology
terms
- A molecular feature viewer, showing the protein region involved in LLPS, disorder prediction via
IUPred,
available structures from the PDB, domain and site definitions, known PTMs and sequence variants
- A structure viewer using LiteMol to depict structures from the PDB overlapping with the LLPS
regions
- Detailed information about the LLPS process, including a description of the functional relevance
of the
membraneless organelle formed, its functional class, the biomolecules involved in the LLPS, and
the molecular
determinants and
interaction types involved in LLPS
- Regulation of LLPS including post-translational modifications and isoforms, together with known
germline
mutations that affect the condensation
- A detailed description of the experimental procedures described in the supporting literature
that serve
as evidence for the involvement of the protein in LLPS, and that prove the liquid state of the
condensates
Download
Data can be downloaded in multiple ways. The whole database can be downloaded in standard JSON, TSV or XML format for local programmatic processing,
individual entries can be accessed through our REST API, as detailed in the
Download section or directly downloaded in the three mentioned formats on the top of each entry page.
Users are also presented with the option to select multiple entries on the
"Browse/Search" page and download them in the given formats.
Annotate
PhaSePro encourages community-based pooling of knowledge via enabling the submission of new LLPS
proteins by users. The Annotate section offers two different means of communicating missing proteins
to
the database curators. The users can fill out the full annotation document for their candidate,
providing all the information that will be featured on the entry page. As an alternative, they have
the
option to fill out a simplified submission form, providing only the basic information (the
gene/protein
name and literature reference(s) supporting the LLPS), based on which the database curators can
create
the final entry. In each case, the provided information is checked for quality and consistency with
PhaSePro standards before inclusion in the PhaSePro core dataset.
List of abbreviations
LLPS - liquid-liquid phase separation
SG - stress granule
MO - membraneless organelle
NMR - Nuclear magnetic resonance
ITC - Isothermal titration calorimetry
FISH - Fluorescence in situ hybridization
FRAP - Fluorescence recovery after photobleaching
FRET - Fluorescent/Förster resonance energy transfer
GFP/YFP/CFP - green/yellow/cyan fluorescent protein
siRNA/shRNA - small interfering/hairpin RNA
DLS - Dynamic light scattering
LC/LCD - Low-complexity/Low-complexity domain
PLD/PrLD - Prion-like domain
IDP/IDR - Intrinsically disordered protein/region
Controlled vocabularies used in LLPS definitions
Membraneless organelles are linked to currently available cellular component
terms in the Gene Ontology
(GO). A more refined classification creating new terms is being introduced into GO, in order to more
faithfully represent the heterogeneity of membraneless organelles.
Experimental procedures used for the study of liquid-liquid phase separation are
defined as free text,
enriched with cross-references to the Evidence and Conclusion Ontology (ECO). The use of GO and ECO
are
fully in line with the practices of core data resources, such as UniProt.
The functional class of the membraneless organelles (MOs) follow a controlled
vocabulary created from
the consensus of the proposed functional classification schemes detailed in the following
publications:
PMID:28808090, PMD:28864230, PMID:30826453, PMID:30682370.
PhaSePro currently uses the following
8
funcional classes:
- activation/nucleation/signal amplification/bioreactor (MOs that activate reactions based on high
local
concentration of the components)
- inactivation/separation/molecular shield (MOs that inactivate reactions by sequestering some of
the
required components while keeping others outside)
- protective storage/reservoir (MOs that form to store/protect molecules in an inactive state for
a
certain period of time, eg. during stress)
- biomolecular filter/selectivity barrier (MOs whose primary function is the selective
concentration of
certain molecules)
- sensor (MOs which form/dissolve on environmental changes (pH, temperature, stress etc.) to
signal them
to the cell)
- regulator of spatial patterns (MOs which act as markers of cell polarity, e.g. help asymmetric
cell
divisions)
- memory device (MOs whose primary function is to act as long-lasting molecular footprints of past
external/internal signals)
- mechanical property exploitation (MOs whose primary function is dependent on the
mechanical/elastic
properties of the condensate itself)
Molecular interactions crucial to the formation of liquid condensates though
LLPS are defined in the
context of a purpose-built controlled vocabulary (CV). This CV is built on the review of Mittag et
al.
(PMID:30099026) containing the following terms:
- multivalent domain-motif interactions (eg. SH3 domains and proline-rich motifs)
- multivalent domain-PTM interactions (eg. SH2 domain - pY, UBA domain - ubiquitin, domains
recognising histones with specific modifications)
- discrete oligomerization (via ordinary oligomerization domains; defined number of
monomers/valency)
- linear oligomerization/self-association (undefined number of monomers/valency)
- coiled-coil formation
- helix-helix interaction driven oligomerization (e.g. formation of helical bundles)
- prion-like aggregation (typically Q/N rich regions)
- formation of amyloid-like/cross-beta/kinked/stacked beta-sheet structures
- protein-RNA interaction (often multivalent)
- protein-RNA interaction (often multivalent)
- protein-DNA interaction (often multivalent)
- simple coacervation of hydrophobic residues
- complex coacervation (IDRs with high net charge and large global dimensions form condensed
droplets with
oppositely charged polymers)
- electrostatic (cation-anion) interaction (typically claimed when blocks of opposite charges
alternate)
- cation-π (cation-pi) interactions
- π-π (pi-pi) interactions
- dipole-dipole interactions
- RNA base pairing/RNA self-assembly
- weak electrostatic or hydrophobic interactions between folded domains (like Pab1 RRMs without
RNA)
- gelation (formation of a system-spanning gel instead of condensed droplets)
After comprehensive reading of the LLPS literature we came up with a custom-built CV that helps
establish broad categories of LLPS cases based on the main molecular determinants and mechanisms
that are pertinent to LLPS:
- Membrane cluster
- Partner-dependent
- RNA-dependent
- PTM required for LLPS
- Domain-motif interactions involved
- Discrete oligomerization involved
The experimental observations supporting the liquid state of the condensates are
defined using terms in
a separate controlled vocabulary, built on the review of Mitrea et al (PMID:30017918), and contains
the
following terms:
- dynamic movement/reorganization of molecules within the droplet (e.g. FRAP, NMR)
- dynamic exchange of molecules with surrounding solvent
- morphological traits (e.g. round shape, fusion of droplets, wetting)
- rheological traits (material properties e.g. viscosity, surface tension, molecular network mesh
size)
- sensitivity to 1,6-hexanediol
- Temperature-dependence
- reversibility of formation and dissolution (with changes in environmental conditions)
- other (none of the above, but supports the liquid material state)