41 characteristics drawn from the 440 cells that fell,
roughly, into four shapes. Using those
characteristics, the program accurately split the samples
into the proper clusters, matching their
shapes. The researchers expect their data analysis tool
to help clinicians obtain meaningful patient
groupings
prior to treatment. (Credit: Qutub Systems Biology Lab/Rice University)
Rice University bioengineers advance computing technique for
health care and more
(August 12, 2015) Rice
University scientists have developed a big data technique that could have a
significant impact on health care.
The Rice lab of bioengineer Amina Qutub designed an
algorithm called “progeny clustering” that is being used in a hospital study to
identify which treatments should be given to children with leukemia.
Details of the work appear today in Nature’s online journal Scientific Reports.
Clustering is important for its ability to reveal
information in complex sets of data like medical records. The technique is used
in bioinformatics — a topic of interest to Rice scientists who work closely
with fellow Texas Medical Center institutions.
“Doctors who design clinical trials need to know how to
group patients so they receive the most appropriate treatment,” Qutub said.
“First, they need to estimate the optimal number of clusters in their data.”
The more accurate the clusters, the more personalized the treatment can be, she
said.
Rice University graduate student Wendy Hu is leading the
development of a new technique
to help clinicians obtain meaningful patient groupings
when designing trials for treatment
of disease. Progeny clustering could have a significant
impact on health care and even
Separating groups by a single data point, like eye color,
would be easy, she said. But when separating people by the types of proteins in
their bloodstreams, it becomes more difficult.
“That’s the kind of data that’s become prevalent everywhere
in biology, and it’s good to have,” Qutub said. “We want to know hundreds of
features about a single person. The problem is identifying how to use all that
data.”
The Rice algorithm provides a way to assure the number of
clusters is as accurate as possible, she said. The algorithm extracts
characteristics about patients from a data set, mixing and matching them
randomly to create artificial populations — the “progeny,” or descendants, of
the parent data. The characteristics appear in roughly the same ratios in
descendants as they do among the parents.
These characteristics, called dimensions, can be anything:
as simple as hair color or place of birth, or as detailed as one’s blood cell
count or the proteins expressed by tumor cells. For even a small population,
each individual may have hundreds or even thousands of dimensions.