Q&A with Luka Wanjohi: Achieving More with Less, the Power of Statistics

Luka Wanjohi (left) works with Bramwel Wanjala (center) and Jolien Swankaert (right) on data collection in the field. The use of tablets and apps ensures that the right data is collected in the right format and helps eliminate errors that save time once allocated to digitalizing data collected with pen and paper. (photo: Nathan Ronoh/CIP)

Luka Wanjohi is a regional senior knowledge management associate at the International Potato Center in Uganda and works with the Sweetpotato Genetic Advancement and Innovative Seed Systems (SweetGAINs) project. He spoke with Regional Communications Specialist, Vivian Atakos about how statistics are fueling breakthroughs in plant breeding by streamlining how information is collected and shared.

Q: When one thinks of plant breeding statistics and data collection are not necessarily the first thing that come to mind. What role does data management plays in the development and adoption of improved plant varieties?

LW: Data can be seen as being at the heart and soul of research activities. Sweet GAINs works in different countries with different partners, everyone has a different way of collecting data and managing their research activities. Proper and very standardized data management activities are required to ensure that we are going to be able to succeed in using data to make our selection decisions. Plant breeders rely on data to make good selection decisions. Before a variety is released, we want to understand what we are replacing. Why do we think that a new varietal release is necessary in a country.

Q:  What are the risks of not systematizing the data generated by a modern breeding program?

LW: If we’re not collecting our data and defining the traits that we collect in a standardized way, then it becomes difficult for us to share data. You’ll find that when you don’t have a systematic way of managing your data, then people who were not there when that data was being generated, might find it difficult to work with that data. Lastly, not systematizing your data means that you’re running the risk of introducing a lot of errors, losing data or collecting corrupted data. This makes it very difficult for you to use this data without doing what we call data curation— an expensive and a tedious process. Even after curation some data will be lost or unusable.

Standardizing data collection and management helps plant breeders build on each other’s work. In the past, data collection was not standardized from project to project and site to site, resulting in misinterpretation or costly losses of time (photo: Isabel Corthier/CIP)

Q: Why is it important to implement Standard Operating Procedures for data management in plant breeding?

LW: The clear benefits that we see with regards to the implementation of these Standard Operating Procedures (SOPs) is that we are able to set clear standards for high quality data in terms of the way we do our trial designs. Our approach to digitalization handles how we collect data, what improved equipment we use, and so forth. The other benefit is that we ensure the data we collect is comparable data today and tomorrow across the different environments and countries, we’re working.

Q: What are the challenges of not standardizing data across a program?

LW: If we want to do a joint analysis in the future, we need standardized naming systems. To tell an advanced trial from a preliminary trial, we use the same ontology. For example, some time ago, we wanted to look at how sweetpotato retrovirus disease is affecting our plants, but it was a challenge because in the past one group might score on a scale of one to five, based on the severity of the effect of a virus on the plant. Then we’d find another program scored on a scale of one to three, and another would use a scoring system of one to nine. When we talk of the ontology, we are standardizing these scores so that they are the same across programs. We also use BreedBase to centralize the storage of all our data and enables us to track trial data across different (program) transitions and pick out potential problems early on in our work as opposed  to discovering that we collected or stored the wrong data right at the very when you want to do an analysis.

Q:  How does data collection and management help breeders target preferred traits, streamline their work and drive down overall costs?

LW: An efficient data management process supports the breeder in quick, cost-effective data collection, that allows them to collect a lot of data, analyze it, and make selection decisions to support their work. Breeders are normally working with a lot of accessions. I have seen trials in Uganda, where a plot had about 7000 entries in a trial that attends to 7000 plots. When you have a large number of accessions, you want to be able to go in, collect your data quickly, and in an efficient way. Putting in a process that allows you to do this digitally, avoids paper entries that run the risk of erroneous data transcription during the digitization. Our tools have built-in checks to ensure we are collecting the right data or restricting data entry as much as possible. Only the allowable values that can be that can be collected. This allows for quick data analysis because it is already digitized and  for quick easy sharing with colleagues and other people who are helping with the data analysis.

Q: Tell us about how the use of barcodes is helping to streamline information.

LW: We have been using a lot of bar-coded labels for some time now. Programs are encouraged to make sure that they print labels for their materials in the screenhouses and out in the field. The essence of barcoded label is to be able to track the movement of a given accession or a given genotype across different work fields. BreedBase, our database, generates a unique ID for every plot that you put out in the field, and we put this ID on the barcode, so we are able to track the performance of a given entry from the field to the harvest, and when we bring it to lab for analysis and so on.

Q: What advice do you have for other organizations looking to implement the standard operating procedures into their breeding programs?

LW: We have learned that our SOPs are a living document. There development was needed because once you have a reference document it becomes easy for people to come back and look at it and support people who are scattered all over the region. My advice to others would be that development should be it should be part and parcel of the activities you’re doing today. Start small, document the procedures as they are today. Iterate with the breeding teams, with the people who are actually implementing the breeding activities and strive to continuously improve this document. Do not aim to have a perfect document ready. Start by writing one and then through the collaborative efforts of everyone on the team look to capture all the processes and all the routine activities in the breeding program. So that over time, you will have a document that is as inclusive as possible and captures all the activities that you that you are undertaking in your program.

Visit us on SoundCloud to listen to the SweetGAINS podcast featuring Luka Wanjohi and other scientists modernizing plant breeding.

This interview has been edited for length and clarity.