Data Scientist - NLP

Ciitizen’s mission is to provide Earth’s 7.3 billion citizens with control of their complete health information and give them the choice of sharing it with whomever they want. Period. With this new ownership, patients can share their health history with caregivers, share for second opinions, and with companies/researchers who may hold the answer to their treatment. 

Current “interoperability” initiatives leave the patient out of the picture and rely on armies of manual labor to extract key health insights from unstructured data—an expensive model that will not scale. And unlike today’s zero-sum “marketplace,” our point of view is that all stakeholders share in the value of the data: institutions that participate, app developers through their services, companies that leverage this data for development of potentially life-saving treatments, and patients—who have a strong incentive to share their complete, longitudinal health history with others.

Ciitizen has assembled an all-star team and is backed by strong investment, led by Andreessen Horowitz, Section 32, and Verily Life Sciences.

Job Overview
We are looking for a savvy NLP Engineer to join our growing Product and Engineering team. The hire will be responsible for designing and building our Clinical NLP engine using core NLP and machine learning, and building domain specific algorithms, with the goal to get relevant clinical context out of unstructured data. The ideal candidate is an experienced NLP engineer who can think out of the box and has excellent problem-solving capabilities.

The NLP Engineer will be expected to use their prior experience with core NLP, clinical NLP (if there), linguistics and machine learning to develop algorithms to power the Ciitizen NLP engine. They will also be expected to define the process and tools/framework to measure the performance and quality of NLP engine, and to set a path for continuous improvement as more data is processed.

The individual must be self-directed, self-learner and comfortable exploring emerging technologies. The right candidate will be excited by the prospect of innovating Citiizen’s NLP engine to support our next generation of products and data initiatives.

  • Understand the clinical NLP context from different clinical domains and areas
  • Develop strategies for extracting clinical context out of clinical documents from different clinical areas
  • Leverage core NLP and existing clinical NLP frameworks like Clamp to develop
  • Leverage cutting edge NLP and machine learning technologies/frameworks like Keras, Tensorflow, PyTorch etc.
  • Train deep learning models with internal and external NLP datasets
  • Define the metrics to be used to measure the NLP engine performance
  • Design and build a feedback control loop system that measures the performance of NLP and calculates the F1 Score, recall and retention etc.
  • Independently work on end-to-end development of NLP models to derive insights from research publications, legal documents, regulatory requirements etc.
  • Lead NLP projects and develop models in collaboration with members of Ciitizen Data Refinery team
  • Mentor junior members of the team
  • Work with stakeholders to refine requirements and communicate progress
  • Work with the team to develop a system for semantic search, knowledge graph creation
  • Deploy models to production and monitor performance
  • Develop original ideas to create cognitive systems 
  • Participate in internal and external forums

  • Extensive experience in applying different NLP techniques to problems such as primary context extraction, assertion, temporal context abstraction, sentence summarization, question answering, knowledge extraction
  • Expertise in NLP methods such as LSA, LDA, Semantic Hashing, Word2Vec, LSTM, BiDAF etc.
  • Strong command over linear algebra and statistics having the ability to quickly translate ideas to efficient, elegant code
  • Development experience in Python or Java/Scala with good command over respective data pipelining, matrix algebra and statistics libraries
  • Experience with frameworks like Stanford CoreNLP with good command over SemRegex
  • Deep learning programming experience with Python/Tensorflow or similar library in a GPU environment
  • Experience working with external reference datasets like SQUAD, SemEval, MSRP, WikTable, WikiQA, AllenAI etc. 
  • Tuning and optimization of sequential deep learning models
  • Experience with scikit-learn, pandas, NumPy, TensorFlow, and Jupyter
  • Experience using Stanford CoreNLP, SpaCy, or similar.
  • Practical experience with text annotation frameworks (e.g. Brat)
  • Proficiency with regular expressions
  • Experience with Elasticsearch or similar
  • Experience with graph databases (e.g. Neo4j)
  • Familiarity or practical background in semantic parsing 
  • Hands on experience or Working knowledge of clinical NLP highly desired
  • Model deployment and scaling experience 
  • 5+years of NLP experience
  • MS in Computer Science with NLP specialization preferred

Palo Alto, CA

Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field