What is liquid-liquid phase separation (LLPS)?

Liquid-liquid phase separation (LLPS) (also known as liquid demixing) processes underlie the formation of diverse membraneless organelles, such as stress granules, P-bodies, the nucleolus and postsynaptic densities, just to name a few (PMID:28935776). These newly discovered and very intensively researched biological condensates represent functional and structural units of cellular organization with an ever increasing number of cellular functions turning out to heavily rely on them, including transcriptional regulation and silencing (PMID:29930091 PMID:28636604) and the signal transduction networks of membrane receptors (PMID:22398450 PMID:27056844). Their formation through LLPS is typically driven by multivalent weak interactions of LLPS driver regions. LLPS has emerged as a general mechanism to organize the intracellular space, exploited not only by eukaryotic cells, but also by bacteria and viruses (PMID:30197298 PMID:28680096). Mutations affecting LLPS driver regions are implicated in diverse neurological disorders, such as Amyotrophic lateral sclerosis (ALS) and Frontotemporal dementia (FTD) (PMID:31188823), cancers, and muscular atrophies (PMID:28817800 PMID:26481498).
The ability to drive liquid-liquid phase separation (LLPS) is encoded in protein sequences, but it can be achieved by diverse architectures, including low sequence complexity disordered regions, multivalent domain – motif interactions, RNA binding domains, oligomerization domains and a combinations of these modules (PMID:30099026). Therefore, the identification of such proteins is difficult. In the last years liquid-liquid phase separation of proteins leading to the formation of membraneless organelles became an absolutely prominent topic in molecular-structural cell biology, resulting in an avalanche of high-impact publications. PhaSePro is the first comprehensive database serving as the central resource for proteins mediating LLPS.

What information is contained in PhaSePro?

PhaSePro stores over one hundred proteins that are able to drive LLPS, all carefully curated and supported by literature references. For all of these proteins, PhaSePro provides:

  • cross-reference to UniProt entries, including accessions, gene/protein names and source species
  • the membraneless organelle(s) (MO) formed, as defined in the Gene Ontology
  • if the protein is part of a multi-component LLPS system, the definition of other proteins needed for the MO formation
  • the sequence boundaries and characteristics of the experimentally validated LLPS driver region(s)
  • a description of the functional relevance of the given LLPS system and related MO, the functional class of the MO
  • the types of molecular interactions involved in LLPS
  • regulatory mechanisms of LLPS: post-translational modifications and alternative splicing events known to influence LLPS, all UniProt isoforms containing sequence changes in the annotated LLPS regions
  • disease mutations confirmed to affect LLPS by experiments
  • literature references from PubMed
In addition, PhaSePro provides a description of the experiment in which LLPS behavior was demonstrated, using the Evidence and Conclusion Ontology (ECO), along with useful lists on the molecular partners and other determinants that affect/regulate LLPS. Last but not least, PhaSePro applies the recently released smart visualization tool of UniProt, ProtVista (PMID:28334231), extended with our annotated LLPS regions, IUPred disorder predictions (PMID:29860432), PFAM annotations and PTMs from PhosphoSitePlus. An overview of the PDB structures of the corresponding LLPS protein regions, if available, is also provided on the entry pages.

Server functionality

  • Entries can be ordered and filtered based on name, organelle type and organism
  • Entries can be filtered based on the molecular determinants of LLPS, such as RNA dependency and the involvement of domain-motif interactions
  • A dedicated search field can be used to search for full or partial common/UniProt names, or UniProt accessions
Entry pages

For each entry the dedicated entry page details information relevant to the protein’s involvement in LLPS, including:

  • Basic data about the protein, including UniProt cross-reference, gene/protein names and source organism
  • Basic data about the LLPS, including the region(s) of the protein shown to mediate LLPS including literature reference, and the type of membraneless organelle formed, linked to Gene Ontology terms
  • A molecular feature viewer, showing the protein region involved in LLPS, disorder prediction via IUPred, available structures from the PDB, domain and site definitions, known PTMs and sequence variants
  • A structure viewer using LiteMol to depict structures from the PDB overlapping with the LLPS regions
  • Detailed information about the LLPS process, including a description of the functional relevance of the membraneless organelle formed, its functional class, the biomolecules involved in the LLPS, and the molecular determinants and interaction types involved in LLPS
  • Regulation of LLPS including post-translational modifications and isoforms, together with known germline mutations that affect the condensation
  • A detailed description of the experimental procedures described in the supporting literature that serve as evidence for the involvement of the protein in LLPS, and that prove the liquid state of the condensates
Data can be downloaded in multiple ways. The whole database can be downloaded in standard JSON, TSV or XML format for local programmatic processing, individual entries can be accessed through our REST API, as detailed in the Download section or directly downloaded in the three mentioned formats on the top of each entry page. Users are also presented with the option to select multiple entries on the "Browse/Search" page and download them in the given formats.

PhaSePro encourages community-based pooling of knowledge via enabling the submission of new LLPS proteins by users. The Annotate section offers two different means of communicating missing proteins to the database curators. The users can fill out the full annotation document for their candidate, providing all the information that will be featured on the entry page. As an alternative, they have the option to fill out a simplified submission form, providing only the basic information (the gene/protein name and literature reference(s) supporting the LLPS), based on which the database curators can create the final entry. In each case, the provided information is checked for quality and consistency with PhaSePro standards before inclusion in the PhaSePro core dataset.

List of abbreviations

LLPS - liquid-liquid phase separation
SG - stress granule
MO - membraneless organelle
NMR - Nuclear magnetic resonance
ITC - Isothermal titration calorimetry
FISH - Fluorescence in situ hybridization
FRAP - Fluorescence recovery after photobleaching
FRET - Fluorescent/Förster resonance energy transfer
GFP/YFP/CFP - green/yellow/cyan fluorescent protein
siRNA/shRNA - small interfering/hairpin RNA
DLS - Dynamic light scattering
LC/LCD - Low-complexity/Low-complexity domain
PLD/PrLD - Prion-like domain
IDP/IDR - Intrinsically disordered protein/region

Controlled vocabularies used in LLPS definitions

Membraneless organelles are linked to currently available cellular component terms in the Gene Ontology (GO). A more refined classification creating new terms is being introduced into GO, in order to more faithfully represent the heterogeneity of membraneless organelles.

Experimental procedures used for the study of liquid-liquid phase separation are defined as free text, enriched with cross-references to the Evidence and Conclusion Ontology (ECO). The use of GO and ECO are fully in line with the practices of core data resources, such as UniProt.

The functional class of the membraneless organelles (MOs) follow a controlled vocabulary created from the consensus of the proposed functional classification schemes detailed in the following publications: PMID:28808090, PMD:28864230, PMID:30826453, PMID:30682370.
PhaSePro currently uses the following 8 funcional classes:

  • activation/nucleation/signal amplification/bioreactor (MOs that activate reactions based on high local concentration of the components)
  • inactivation/separation/molecular shield (MOs that inactivate reactions by sequestering some of the required components while keeping others outside)
  • protective storage/reservoir (MOs that form to store/protect molecules in an inactive state for a certain period of time, eg. during stress)
  • biomolecular filter/selectivity barrier (MOs whose primary function is the selective concentration of certain molecules)
  • sensor (MOs which form/dissolve on environmental changes (pH, temperature, stress etc.) to signal them to the cell)
  • regulator of spatial patterns (MOs which act as markers of cell polarity, e.g. help asymmetric cell divisions)
  • memory device (MOs whose primary function is to act as long-lasting molecular footprints of past external/internal signals)
  • mechanical property exploitation (MOs whose primary function is dependent on the mechanical/elastic properties of the condensate itself)

Molecular interactions crucial to the formation of liquid condensates though LLPS are defined in the context of a purpose-built controlled vocabulary (CV). This CV is built on the review of Mittag et al. (PMID:30099026) containing the following terms:

  • multivalent domain-motif interactions (eg. SH3 domains and proline-rich motifs)
  • multivalent domain-PTM interactions (eg. SH2 domain - pY, UBA domain - ubiquitin, domains recognising histones with specific modifications)
  • discrete oligomerization (via ordinary oligomerization domains; defined number of monomers/valency)
  • linear oligomerization/self-association (undefined number of monomers/valency)
  • coiled-coil formation
  • helix-helix interaction driven oligomerization (e.g. formation of helical bundles)
  • prion-like aggregation (typically Q/N rich regions)
  • formation of amyloid-like/cross-beta/kinked/stacked beta-sheet structures
  • protein-RNA interaction (often multivalent)
  • protein-RNA interaction (often multivalent)
  • protein-DNA interaction (often multivalent)
  • simple coacervation of hydrophobic residues
  • complex coacervation (IDRs with high net charge and large global dimensions form condensed droplets with oppositely charged polymers)
  • electrostatic (cation-anion) interaction (typically claimed when blocks of opposite charges alternate)
  • cation-π (cation-pi) interactions
  • π-π (pi-pi) interactions
  • dipole-dipole interactions
  • RNA base pairing/RNA self-assembly
  • weak electrostatic or hydrophobic interactions between folded domains (like Pab1 RRMs without RNA)
  • gelation (formation of a system-spanning gel instead of condensed droplets)

After comprehensive reading of the LLPS literature we came up with a custom-built CV that helps establish broad categories of LLPS cases based on the main molecular determinants and mechanisms that are pertinent to LLPS:

  • Membrane cluster
  • Partner-dependent
  • RNA-dependent
  • PTM required for LLPS
  • Domain-motif interactions involved
  • Discrete oligomerization involved

The experimental observations supporting the liquid state of the condensates are defined using terms in a separate controlled vocabulary, built on the review of Mitrea et al (PMID:30017918), and contains the following terms:

  • dynamic movement/reorganization of molecules within the droplet (e.g. FRAP, NMR)
  • dynamic exchange of molecules with surrounding solvent
  • morphological traits (e.g. round shape, fusion of droplets, wetting)
  • rheological traits (material properties e.g. viscosity, surface tension, molecular network mesh size)
  • sensitivity to 1,6-hexanediol
  • Temperature-dependence
  • reversibility of formation and dissolution (with changes in environmental conditions)
  • other (none of the above, but supports the liquid material state)