FindZebra: a specialised search engine for rare diseases

When you hear hoof beats think of horses, not zebras

Dr Theodore Woodward


Every doctor is taught to think of the common causes of a problem before contemplating the rare ones. This works for the majority of patients but occasionally the cause is something unusual – a “zebra” – and isn’t so easy to find. For rare disease patients a specific diagnosis can be elusive with 46% receiving at least one incorrect diagnosis and 20% waiting over five years for a final diagnosis (Rare Disease UK, 2010).

Medical experts, and non-experts, use the internet for health related searches. The ability to mine the increasing quantity of available information can help clinicians to diagnose difficult cases. It is also common for patients to research their symptoms to try and find a diagnosis. However the standard search engines are not optimised to find results relevant to rare diseases. Search engines generally rank results based on popularity and can return irrelevant and untrustworthy results.

FindZebra is a specialist search engine designed specifically to help doctors diagnose rare diseases (Dragusin et al., 2013). FindZebra only mines for relevant information from reputable rare disease sources – OMIM, Orphanet, GARD, NORD and other smaller rare disease databases – to limit irrelevant results and reduce the time required for assessment. A search can be based on multiple phenotypes and reference points, and FindZebra will return near match results as well as exact matches to account for missing or non-specific phenotypes. A recent addition to FindZebra is a separate list of genes associated with the search term. This provides rapid access to relevant genes for future genetic testing or alternatively allows for the identification of diseases associated with mutation in a particular gene.

Dragusin et al. (2013) compared the ability of FindZebra and Google to find disease-relevant information based on the symptoms of 56 difficult diagnostic cases. Within the top 20 results FindZebra produced a relevant result in 68% of cases, compared to only 32% when using Google. Limiting Google to the same datasets used by FindZebra produced relevant information in only 11% of cases demonstrating the unsuitability of the Google ranking system for this this kind of query. FindZebra is also significantly better with long search queries, such as complex medical histories, than Google which is optimised for short queries.

A Google search for individual BHD symptoms does not produce any immediately relevant results on the first page. The exception is a search for “fibrofolliculomas” but as a highly BHD-specific term this is unlikely to be a common differential diagnostic search term. Combinations of BHD phenotypes yield more prominant relevant results. In comparison a search on FindZebra for “spontaneous pneumothorax” listed BHD in the top results, and has FLCN as the most relevant gene. On FindZebra, as with Google, a search for “renal cell carcinoma” is not specific enough to identify BHD from other hereditary RCC diseases such as VHL or TSC. However the addition of “hybrid” or “multifocal” to the search puts BHD and FLCN in the top results. Again combinations of phenotypes result in BHD being a top result.

One of the biggest differences between the results produced by Google and FindZebra is the ease of reading and assessment. A Google search result involves visiting new pages and sometimes finding the relevant information can be difficult. The FindZebra system shows the detailed result, with the disease in the title, alongside the initial search results and highlights relevant information making it easier to quickly scan for relevance and compatibility.

FindZebra was launched in 2013 and produced over 1,000,000 diagnostic hypotheses for 30,000 unique visitors in the first five weeks (Dragusin et al., 2013b) although no data has been published yet regarding the accuracy of these hypotheses. As more clinicians become aware of the benefits of using FindZebra undoubtedly its user group will grow. Such a tailored search engine has great potential to aid in the rapid diagnosis of numerous rare disease patients.

FindZebra is undoubtedly one of the fastest and most reliable ways to link crucial symptoms to causal genes… [and] has rapidly become an integrated part of our diagnostic set up for rare diseases.

Finn Cilius Nielsen, Center for Genomic Medicine, University of Copenhagen


