Posts Tagged ‘DNA’
abiogenesis – some amateur explorations

woteva
One of the greatest mysteries and challenges we face, as living beings – if we’re interested – is how living beings came to be. And we’re the only form of living beings, that we know of, asking this question. Hans Castorp, the central character of Thomas Mann’s The Magic Mountain, pondered the matter in his loggia while taking the cure in an alpine sanatorium. He even went further than the What is life question, asking What is matter? Why is there something rather than nothing?
It was a novel that changed my life. From that reading experience I turned, quite abruptly, to science. I bought Scientific American every month, until I switched to New Scientist, and started reading books by Richard Dawkins, Peter Atkins et al. Of course I’ve never undertaken any formal studies in science, and I’ve always preferred the informal to the formal, and not being subject to authorities telling me what to learn or know. That’s why Hans Castorp, reading and musing in his loggia, so appealed to me.
So what do we know on this subject? When did life begin on Earth, and how? It could have been close to 4 billion years ago, only half a billion years(!) after our planet was fully formed. We don’t have solid evidence, though. The earliest accepted evidence goes back 3.5 billion years, of ‘bacteria-like organisms’. That sounds pretty complex already, and presumably the ‘ingredients’, the intracellular material that sustained and motivated these beings, were around long before. Complexifying chains of molecules, formed out of the ‘primordial soup’, to use an unhelpful term. We think RNA and DNA of course, or at least nucleic acid chains. But what are nucleic acids, and what are the parts thereof? Other essential components include proteins and lipids, with the latter being essential to create more or less permeable boundaries between the organic and the inorganic (or proto-organic?). Lipid molecules, as the Arvin Ash video referenced below tells us, consist of a hydrophilic body, of sorts, and a hydrophobic tail. These molecules tend to come together to form spheres, with the outer, bulkier, hydrophilic ends joining together to protect or insulate the hydrophobic tails from the watery outer environment.
So there’s always a ‘what came before’ question. Where did these lipid molecules spring from, not to mention the other bits and bobs of life? Well, on lipids, I’m relying, for now, on the same video. Carbon monoxide (CO), hydrogen and minerals found in the Earth’s crust can combine to form lipids. All of these components can be found in the hydrothermal vents so recently found in the Pacific depths. But lipid structures break down in the presence of salt or magnesium ions, and these ions are essential for cellular and RNA development. Big problem, as the primeval oceans are believed to be more salty than those of today – though apparently we’re far from being certain about this. In any case, a 2019 paper from the University of Washington showed that lipid spheres remained intact in the presence of amino acids, the building blocks of protein molecules. To quote from the video,
The enclosing of amino acids within cell walls allows them to concentrate within those walls and interact with each other to form proteins, which are part of the ‘trinity’, one of the essential components of life.
So lipid cell walls and proteins, both of course non-living, require each other to survive in salty or iron-rich water. But what about the nucleic acids, DNA and RNA? These are the self-replicating molecules, the genetic material, or precursor genetic material. Today we know that RNA is created from DNA to build proteins according to DNA’s code, but the fact that RNA is the simpler of the two genetic materials suggests to most analysts that it came first. So there’s a hypothesis called the ‘RNA world’, which is generally well accepted by those in the field, but unfortunately we’ve made little progress in working out how RNA came to be formed.
RNA is made up of three chemical components – ribose (a sugar), the nucleobases, and phosphate. A ribose-base-phosphate unit links with other such units to form RNA polymer. But it’s not well understood how these links were formed, and they haven’t been successfully replicated in human experiments. The ribose-base link has proved particularly problematic. As Arvin Ash describes it, ‘this is because cells in your body require complex enzymes to bring RNA building blocks together before they combine to form polymers’. He describes one study, however, which found that today’s RNA could have formed on the surface of clays ‘which act like a catalyst to bring RNA bases together’. A later study showed that the building blocks of RNA could have polymerised in the early Earth, using organic molecules from meteorites and interplanetary dust in shallow ponds, where wet/dry cycles would have been conducive to such polymerisation. They considered that these polymers were probably present on Earth shortly after its formation.
So Ash describes a trinity – RNA, lipids and proteins. What about the proteins? We can go back to the Miller-Urey experiments of the 1950s, which showed that amino acids, the essential components of proteins, as well as other organic compounds, could be produced under particular atmospheric conditions, which they were able to replicate in the laboratory.
So, all these precursors might be explained, but they still need to combine for life as we know it, however basic. This is the big question that still needs to be answered. We haven’t discovered any precise mechanism, but oodles of time, and incremental steps are probably required, and there is surely a possibility of this in the first billions of our planet’s existence, wherein trillions of molecular interactions may have taken place. It’s something of a numbers game, something that many earlier theorists, and today’s creationists, have not taken sufficient account of. It’s also probable that the earliest life forms, those sparks, were so basic that they were quickly improved upon and rendered obsolete by – evolution. But that’s another story…
Needless to say, this piece was more or less wholly reliant on Arvin Ash’s excellent video, which I highly recommend.
References
Why is the Ocean So Salty?
introducing myself to abiogenesis, sort of

Yes, watch out for the creationists and their ultra ultra ultra male god…
So, more sciencey stuff by a non-scientist, this time on how life came about from non-life, and where exactly the boundary lies. I seem to recall, years ago, that Craig Venter, something of a maverick biochemist, or whatever, was competing with the ‘official’, i.e government-funded, program, to map the human genome, and it might’ve come out as a tie, but don’t quote me. And then Venter and Co went on to work on abiogenesis, and then I lost touch…
I was reminded of all this when I watched a video featuring a Christian fundamentalist and biochemist, James Tour (I keep thinking James Tool) and his fight with mainstream biochemists on the difficulty/impossibility of life coming from non-life, because, of course, God – or as Americans like to call him, Guard, because, as we know, Guard blesses America, and safeguards Him (because, as we know America is as fundamentally male as Guard) on an ongoing basis.
In googling Mr Tour, the first thing I came up with was ‘Is James Tour religious?’ The answer, of course, is another question – Do bears shit in the woods?
But let’s not get too lazy by mocking US silliness ad nauseam. In the video, Tour is shown violently lashing out at claims that there is any possible chemical pathway for something living – that’s to say self-sustaining – to have come from something purely chemical, no matter how complex. And yet, in spite of Tour’s noisy, over-the-top attacks on the whole abiogenesis program, presumably because it was ‘playing Guard’, in the end, when talking to a sympathetic and doubtless Christian interviewer, he admitted that we might one day work out the process that sparked life, in spite of its ‘infinite’ (or near-infinite) complexity, because, after all, Guard is infinite (or near-infinite?)….
I suspect he might regret that admission.
So, after all that, how are we going on the abiogenesis front? First, a little history. Spontaneous generation was once considered very much a thing, in the days before microscopes and such, and this is unsurprising, as I myself have seen maggots ‘suddenly’ infesting something rotting in a cupboard in my lazy house-sharing youth. Such situations caused considerable debate in earlier centuries, until better technologies and experiments, in particular the work of Louis Pasteur, finally disproved the concept. But this, of course, left a gap – if there was no spontaneous generation of life, and evolution by natural selection had nothing to say on the subject, then – maybe Guard? Or Guard of the Gap?
But enough of Guard, we already have complex collections of molecules, such as viruses, which seem to bridge the gap between life and non-life through their ability to replicate rapidly under particular conditions – but not independently. According to the RationalWiki on the subject:
Abiogenesis is not a single step event, but a process. Biological life has the properties or capabilities of organization, metabolism, homeostasis, growth, reproduction, response, and evolution.
So, it’s generally considered likely that abiogenesis cannot be sheeted home to one semi-miraculous event – more likely there were various combinatorial chemical developments that more or less succeeded in maintaining the above-mentioned properties. At some stage in this process, a stable life-form emerged that combined these ingredients effectively. This life-form has been dubbed the last universal common ancestor (LUCA).
Three elements appear to be essential – carbon, and hydrogen and oxygen in the form of water. The compounds focussed on by biochemists studying the subject are lipids, which can form membranes, carbohydrates, which can provide energy, amino acids, and nucleic acids (DNA and RNA) for reproduction.
I’m fairly clueless, so I’ll start with amino acids. Wikipedia tells me they’re essential for ‘protein metabolism’, but apparently not all amino acids are involved in this process – far from it. Of the more than 500 amino acids that we know to exist, there are only 22 that are ‘incorporated into proteins’ and into the genetic code of all life. They’re called proteinogenic amino acids, or α-amino acids (alpha amino acids).
But what exactly is an amino acid? Obviously it’s an acid, which we tend to think is something negative that breaks down and destroys stuff. But then amino makes me think of animation, in a scrambled sort of way. I mean, life? They are described as organic molecules, or organic compounds after all. Why? Apparently, for many biochemists an organic compound is one containing carbon. The proteinogenic amino acids are the ‘raw material’ assembled by our ribosomes (by the ribosomes of all living cells?) into the multitudinous peptides and proteins that do so much mysterious work throughout our bodies. I’m getting most of this from Wikipedia, a fantastic resource that just keeps getting fantasticker. It’s article on abiogenesis is itself virtually book-length, and the links take you to dozens of other useful and lengthy articles.
So how did amino acids come into being? Before ribosomes, the amino acid-making machines in our cells, came into being, that is. Well, first we needed the elemental ingredients, and they existed billions of years ago, at the Earth’s formation, and even before the Sun had coalesced into the star we know today. In a PubMed article abstract, ‘The origin of the biologically coded amino acids’ (that’s to say the proteinogenic ones), the problem/solution is put this way:
The types of amino acids produced depend on the conditions which prevailed at the time of synthesis, which remain controversial. The selection of the biological set is likely due to chemical and early biological evolution acting on the environmentally available compounds based on their chemical properties. Once life arose, selection would have proceeded based on the functional utility of amino acids coupled with their accessibility by primitive metabolism and their compatibility with other biochemical processes.
So, before there was biological evolution there was chemical evolution, which also may have been a matter of fits and starts. For example, some have speculated that carbonaceous meteorites raining down on the early Earth may have provided a spark, or a boost. These speculations are forward-looking from the non-living, in a sense, while another approach is backward-looking from known candidates for LUCA. Here’s how Wikipedia puts it:
It appears there are 60 proteins common to all life and 355 prokaryotic genes that trace to LUCA; their functions imply that the LUCA was anaerobic with the Wood–Ljungdahl pathway, deriving energy by chemiosmosis, and maintaining its hereditary material with DNA, the genetic code, and ribosomes. Although the LUCA lived over 4 billion years ago (4 Gya), researchers believe it was far from the first form of life. Earlier cells might have had a leaky membrane and been powered by a naturally occurring proton gradient near a deep-sea white smoker hydrothermal vent.
I won’t pretend I understand all that, but prokaryotes are unicellular organisms, and anaerobic respiration utilises ‘electron transport chains’ other than – and less efficient than – oxygen. The Wood-Ljungdahl pathway is, inter alia, a proposed mechanism – still controversial – for the anaerobic prokaryotic life found at deep sea alkaline hydrothermal vents, in the late 1970s.
The key problem, it seems to me, is that of effective replication, way back in the day. DNA and RNA are both very complex molecules, and so they didn’t just spring into existence. The Wikipedia article articulates the problem in a sentence that’s easy to simply overlook:
Prebiotic synthesis creates a range of simple organic compounds, which are assembled into polymers such as proteins and RNA.
We’re still quite a way from understanding that ‘assembly’ stage, though we’ve managed a bit of prebiotic synthesis, but there’s no reason to assume that we can’t work it all out. Now if we could find simple, perhaps differently-organised life or proto-life on other planets or moons…
That’s astrobiology, apparently. And Wikipedia can explain it all better than me, so excuse my laziness.
The 2015 NASA strategy on the origin of life aimed to solve the puzzle by identifying interactions, intermediary structures and functions, energy sources, and environmental factors that contributed to the diversity, selection, and replication of evolvable macromolecular systems, and mapping the chemical landscape of potential primordial informational polymers. The advent of polymers that could replicate, store genetic information, and exhibit properties subject to selection was, it suggested, most likely a critical step in the emergence of prebiotic chemical evolution. Those polymers derived, in turn, from simple organic compounds such as nucleobases, amino acids, and sugars that could have been formed by reactions in the environment. A successful theory of the origin of life must explain how all these chemicals came into being.
Hoping to write about this more in the future, exploring any new developments, if any.
References
https://rationalwiki.org/wiki/Abiogenesis
https://en.wikipedia.org/wiki/Abiogenesis
understanding genomics 3: SNPs and other esoterica
Canto: So SNPs are pretty essential to modern genomics I believe, so why, and what are they? I know that they’re ‘single nucleotide polymorphisms’ and that nucleotides are A, C, G, T and U, each of which have a slightly different structure. They’re all based on sugar structures – ribose in the case of RNA and deoxyribose in the case of DNA – attached to a phosphate group and a nitrogenous base. Here’s a diagram of thymine (T) filched from the USA’s National Human Genome Research Institute:

So that’s a nucleotide, one of the building blocks of DNA and RNA, but the real problem, for me anyway, is the connection between single and polymorphic, if there is one. I know that poly means many and that morphology is about shape and size and such….
Jacinta: You can only get so far with interrogating the words themselves. An SNP is a genetic variation in a single nucleotide between one person’s genome and another (I think). But there are many of these variations, which is where the ‘poly’ comes in. I’ll quote this from a NIH website, and then try to make sense of it:
SNPs occur normally throughout a person’s DNA. They occur almost once in every 1,000 nucleotides on average, which means there are roughly 4 to 5 million SNPs in a person’s genome. These variations occur in many individuals; to be classified as a SNP, a variant is found in at least 1 percent of the population. Scientists have found more than 600 million SNPs in populations around the world.
Canto: So they’re called ‘variants’ because they vary from the ‘normal’ pattern in 1% or more of those whose genomes are mapped? So there’s such a thing as a ‘normal’ human genome, but perhaps everyone differs from that normal pattern due to different SNPs? And why is 1% the cut-off? Isn’t that a bit arbitrary? Also, it says that these variations occur in many individuals, which sounds a bit vague. Does this mean that there are many individuals where they don’t occur at all? I mean, what is a normal human genome, if there are so many variants? Is it just some kind of aggregated value?
Jacinta: Uhh, maybe. And note – but I’m not sure if this is relevant to your question – that these SNPs mostly occur in non-coding DNA, where they won’t be affecting the phenotype and its general functioning, though it seems to depend on how close they are to coding regions. Anyway, we’re just scratching the surface here. Look at this diagram, from Wikipedia.

As you can see, there are synonymous and non-synonymous SNPs. Synonymous with what, you might ask?
Canto: As a language teacher I know what a synonym is, obviously. My guess is that a synonymous SNP is associated with, ‘synonymous’ with, some kind of malfunction or defect, or maybe different function or effect. A ‘missence’, as the diagram suggests.
Jacinta: No, it’s the non-synonymous SNPs that cause the problems, because coding DNA generally leads to effective function, that’s what it’s all about. If the SNP is synonymous then it works toward proper functioning, perhaps by a different pathway, or it just doesn’t affect the pathway.
Canto: What I’m learning about genetics/genomics is that the more I delve into the subject, the more there is to learn, and yet I don’t really want to specialise, I want to know a bit of everything. I’ve just learned, for example, that it’s not just a divide between coding and non-coding DNA, because a mutation near a coding region can have effects, deleterious or otherwise, I think.
Jacinta: I don’t know about that, but I’m learning some interesting random facts, for example that there appears to be more C-G base pairings in coding DNA than T-A. Just to get it in our heads, cytosine (a pyrimidine) always pairs with guanine (a purine), and the other pyrimidine, thymine, always pairs with adenine. Always purines with pyrimidines, and purines are the larger molecules, with a two-ring structure, rather than one for pyrimidines. Note the structure of thymine, above. Anyway, back to SNPs, which we’re interested in mainly for what they might tell us about earlier populations. I’ve just glanced through a 2020 research article – generally way to technical for lay persons or dilettantes like us, titled ‘Genome-wide SNP typing of ancient DNA: Determination of hair and eye color of Bronze Age humans from their skeletal remains’. I did get some useful info from it though. The researchers compared the SNP method with ‘single base extension (SBE) typing’, and what they found was interesting enough:
The DNA samples were extracted from the skeletal remains of 59 human individuals dating back to the Late Bronze Age. The 3,000 years old bones had been discovered in the Lichtenstein Cave in Lower Saxony, Germany.
It seems that this was a kind of proof-of-concept piece of research, and they were able to obtain good to excellent results from two thirds of the skeletal samples:
With the applied technique, it was for the first time possible to get information about major phenotypic traits—eye and hair color—of an entire prehistoric population. The range of traits, varying from blonde to brown hair and blue to green-hazel eye colors for the majority of individuals is a plausible result for a Central European population.
Canto: Yes, that’s the exciting stuff – true it’s only going back 3000 years, and you could say that there were no surprises in the findings – but it brings the past back to life in such a vivid way… what can I say?
Jacinta: So you don’t want to know about haplotypes, and homozygous and heterozygous alleles? What’s wrong with you?
Canto: Okay, a haplotype – haven’t we gone through this? – a haplotype is a set of variants, or polymorphisms, along a single chromosome, involving one or more genes, that tend to stick together, inheritance-wise. We know that homozygous inheritance means inheritance from both parents whereas heterozygous means that you have a different genetic marker from each parent. A genetic marker is any ‘DNA sequence with a known location on a chromosome’. They may offer clues to inherited traits, such as diseases. All of this comes from the USA’s National Human Genome Research Institute, and I think I mostly understand it.
Jacinta: So SNPs can have all sorts of uses, regarding the present and the past, and tracing the present into the past, as with disease gene mapping. Their abundance within the genome has made them the go-to marker in bioinformatics. My guess, though, is we’ll never get to fully understand them without actually working with them. I mean, we can go through ScienceDirect, and jump from underlined term to underlined term (e.g. linkage disequilibrium, QTL mapping, PCR assays, point mutations and the like), but we’ll start to forget it all from the moment we have aha moments, because for us dilettantes, locked out of labs due to dumbness, shyness, laziness, poverty-ness etc, it’s all just book-larnin, sans even books. I suppose we just have to be grateful that we’ve, or they’ve, developed the technology to collect and analyse SNPs, to create libraries of them…
Canto: It seems like, as with so many fields, we’re at what Deutsch called ‘the beginning of infinity’ – but then didn’t they think that at the advent of string theory?
Jacinta: But we know this isn’t theory, this is about results. Tools producing results. Tools within the body, or rather natural phenomena made into tools by human ingenuity, like circles made into wheels, cubes into containers, triangles into struts. And we’re likely to get more and more out of DNA in the future. I recently learned about the petrous bone, though of course researchers have known about it for some years – it’s about the hardest part of the skull, down somewhere near the foramen magnum I think, and its density has, it seems, been a preservative for DNA – generally better than teeth. So that means more analysis of fossil collections. As David Reich puts it, technologies for analysing ancient DNA have created an explosion of information to rival the invention of the microscope/telescope a few hundred years ago.
Canto: Yes, some of the developments he mentions are next-generation sequencing (which has vastly reduced sequencing costs), more efficient DNA extraction methods, improvements in separating human from microbial DNA, and again the use of the petrous bone for extraction – a bone which tends to remain intact longer than others.
Jacinta: Okay, so we might continue to blunder on in trying to make sense of this genomics stuff, or maybe not. Enough for now.
References
https://www.genome.gov/genetics-glossary/Nucleotide
https://medlineplus.gov/genetics/understanding/genomicresearch/snp/
https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/point-mutation
https://en.wikipedia.org/wiki/Coding_region
https://onlinelibrary.wiley.com/doi/full/10.1002/ajpa.23996
understanding genomics 1 – mitochondrial DNA

Canto: So maybe if we got humans to mate with bonobos we’d get a more promising hybrid offspring?
Jacinta: Haha well it’s not that simple, and I don’t mean just physiologically…
Canto: Okay those species wouldn’t be much attracted to each other – though I’ve heard that New Zealanders are very much attracted to sheep, but that just might be fantasy. But seriously, if two species – like bonobos and chimps, can interbreed, why can’t bonobos and humans? And they’d don’t have to canoodle, you can do it like in vitro fertilisation, right?
Jacinto: Well, bonobos and chimps are much more closely related to each other than they are to humans. And if you think bonobo-human hybridisation will somehow create a female-dominant libertarian society, well – it surely ain’t that simple. What we see in bonobo society is a kind of social evolution, not merely a matter of genetics. But having said that, I’m certainly into exploring genetics and genomics more than I’ve done so far.
Canto: Yes, I’ve been trying to educate myself on alleles, haplotypes, autosomal and mitochondrial DNA, homozygotism and heterozygotism (if there are such words), single nucleotide polymorphisms and…. I’m confused.
Jacinta: Well, let’s see if we can make more sense of the science, starting with, or continuing with Who we are and how we got here, which is mostly about ancient DNA but also tells us much about the past by looking at genetic variation within modern populations. Let me quote at length from Reich’s book, a passage about mitochondrial DNA – the DNA in our mitochondria which is somehow passed down only along female lines. I’ve no idea how that happens, but…
The first startling application of genetics to the study of the past involved mitochondrial DNA. This is a tiny proportion of the genome – only approximately 1/200,000th of it – which is passed down from mother to daughter to granddaughter. In 1987, Allan Wilson and his colleagues sequenced a few hundred letters of mitochondrial DNA from diverse people around the world. By comparing the mutations that were different among these sequences, he and his colleagues were able to construct a family tree of maternal relationships. What they found is that the deepest branch of the tree – the branch that left the main trunk earliest – is found today only in people of sub-Saharan African ancestry, suggesting that the ancestors of modern humans lived in Africa. In contrast, all non-Africans today descend from a later branch of the tree.
Canto: Yes, I can well understand the implications of that analysis, but it skates fairly lightly over the science, understandably for a book aimed at the general public. To be clear, they looked at the same stretches of mitochondrial DNA in diverse people, comparing differences – mutations – among them. And in some there were many mutations, suggesting time differences, due to that molecular clock thing. And I suppose those that differed most – from who? – had sub-Saharan ancestry.
Jacinta: Dating back about 160,000 years, according to best current estimates.
Canto: The science still eludes me. First, how does mitochondrial DNA pass only through the female line? We all have mitochondria, after all.
Jacinta: Okay, I’ve suddenly made made myself an expert. It all has to do with the sperm and the egg. One’s much bigger than the other, as you know, because the egg carries nutrients, including mitochondria, the only organelle in your cytoplasm that has its own DNA. Your own little spermatozoa are basically just packages of nuclear DNA, with a tail. Our mitochondrial DNA appears to have evolved separately from our nuclear DNA because mitochondria, or their ancestors, had a separate existence before being engulfed by the ancestors of our somatic or eukaryotic cells, in a theory that’s generally accepted if difficult to prove. It’s called the endosymbiosis theory.
Canto: So mitochondria probably had a separate, prokaryotic existence?
Jacinta: Most likely, which could take us to the development, the ‘leap’ if you like, of prokaryotic life into the eukaryotic, but we won’t go there. Interestingly, they’ve found that some species have mitochondrion-related organelles with no genome, and our own and other mammalian mitochondria are full of proteins – some 1500 different types – that are coded for by nuclear rather than mitochondrial DNA. Our mitochondrial DNA only codes for 13 different types of protein. It may be that there’s an evolutionary process going on that’s transferring all of our mitochondrial DNA to the nucleus, or there might be an evolutionary reason for why we’re retaining a tiny proportion of coding DNA in the mitochondria.
Canto: So – we’ve explained why mitochondrial DNA follows the female line, next I’d like to know how we trace it back 160,000 years, and can place the soi-disant mitochondrial Eve in sub-Saharan Africa.
Jacinta: Well the term’s a bit Judeo-Christian (there’s also a Y-chromosomal Adam), but she’s the matrilineal most recent common ancestor (mt-MRCA, and ‘Adam’ is designated Y-MRCA).
Canto: But both of these characters had parents and grandparents – who would be somehow just as common in their ancestry but less recent? I want to know more.
Jacinta: To quote Wikipedia…
… she is defined as the most recent woman from whom all living humans descend in an unbroken line purely through their mothers and through the mothers of those mothers, back until all lines converge on one woman.
… but I’m not sure if I understand that convergence. It clearly doesn’t refer to the first female H sapiens, it refers to cell lines, haplogroups and convergence in Africa. One of the cell lines used to pinpoint this convergence was HeLa, the very first and most commonly used cell line for a multiplicity of purposes…
Canto: That’s the Henrietta Lacks cell line! We read The Immortal Life of Henrietta Lacks! What a story!
Jacinta: Indeed. She would be proud, if she only knew… So, after obtaining data from HeLa and another cell line, that of an !Kung woman from Southern Africa, as well as from 145 women from a variety of populations:
The published conclusion was that all current human mtDNA originated from a single population from Africa, at the time dated to between 140,000 and 200,000 years ago.
Canto: So mt-MRCA is really a single population rather than a single person?
Jacinta: Yeah, maybe sorta, but don’t quote me. The Wikipedia article on this gives the impression that it’s been sheeted home to a single person, but it’s vague on the details. Given the way creationists leap on these things, I wish it was made more clear. Anyway the original analysis from the 1980s seems to be still robust as to the time-frame. The key is to work out when all female lineages converge, given varied mutation rates. So, I’m going to quote at length from the Wikipedia article on mt-MRCA, and try to translate it into Jacinta-speak.
Branches are identified by one or more unique markers which give a mitochondrial “DNA signature” or “haplotype” (e.g. the CRS [Cambridge Reference Sequence] is a haplotype). Each marker is a DNA base-pair that has resulted from an SNP [single nucleotide polymorphism] mutation. Scientists sort mitochondrial DNA results into more or less related groups, with more or less recent common ancestors. This leads to the construction of a DNA family tree where the branches are in biological terms clades, and the common ancestors such as Mitochondrial Eve sit at branching points in this tree. Major branches are said to define a haplogroup (e.g. CRS belongs to haplogroup H), and large branches containing several haplogroups are called “macro-haplogroups”.
So let’s explain some terms. A genetic marker is simply a DNA sequence with a known location on a chromosome. A haplotype or haploid genotype is, as the haploid term suggests, inherited from one rather than both parents – in this case a set of alleles inherited together. SNPs or ‘snips’ are differences of a single nucleotide – e.g the exchange of a cytosine (C) with a thymine (T). As to the rest of the above paragraph, I’m not so sure. As to haplogroups, another lengthy quote makes it fairly clear:
A haplogroup is…. a group of similar haplotypes that share a common ancestor with a single-nucleotide polymorphism mutation.More specifically, a haplogroup is a combination of alleles at different chromosomal regions that are closely linked and that tend to be inherited together. As a haplogroup consists of similar haplotypes, it is usually possible to predict a haplogroup from haplotypes. Haplogroups pertain to a single line of descent. As such, membership of a haplogroup, by any individual, relies on a relatively small proportion of the genetic material possessed by that individual.
Canto: Anyway, getting back to mt-MRCA, obviously not as memorable a term as mitochondrial Eve, it seems to be more a concept than a person, if only we could get people to understand that. If you want to go back to the first individual, it would be the first mitochondrion that managed to synthesise with a eukaryotic cell, or vice versa. From the human perspective, mt-MRCA can be best conceptualised as the peak of a pyramid from which all… but then she still had parents, and presumably aunts and uncles…. It just does my head in.
References
https://www.genome.gov/genetics-glossary/Mitochondrial-DNA
https://en.wikipedia.org/wiki/Mitochondrial_Eve
https://en.wikipedia.org/wiki/Haplogroup
a DNA dialogue 6: Okazaki fragments, as promised

Canto: Okay, so first off, why are Okazaki fragments so called?
Jacinta: Well as anyone would guess, they’re named after someone Japanese, in this case two, the husband and wife team Reiji and Tsuneko Okazaki, who discovered these short, discontinuously synthesised stretches of DNA nucleotides in the 1960s.
Canto: Yes their story is intriguing – tragic but also inspiring. Reiji, the husband, was born in Hiroshima and died in 1975 from leukaemia, related to the 1945 A-bomb. He was only 44. Tsuneko Okazaki continued their research and went on to make many other contributions to genetics and molecular biology, as a professor, teacher, mentor and director of scientific institutes. Her achievements would surely make her a Nobel candidate, and she’s still alive, so maybe…
Jacinta: Now the key to Okazaki fragments is this lagging strand. Its directionality means that the DNA primase, followed by the DNA polymerase, must work ‘backwards’, away from the replication fork, to add nucleotides. This means that that they have to have periodic breaks – but I’m not sure exactly why – in creating this lagging strand. So the entire replication process is described as semi-discontinuous because of this fundamental difference between the continuously created leading strand and the stop-start ‘fragmentary’ (at least briefly) lagging strand.
Canto: But we need to know why this ‘backward’ movement has to be stop-start, and I’d also like to know more about this primase and polymerase, thank you.
Jacinta: Well the Okazakis and their team discovered this semi-discontinuous replication process in studying the replication of good old Escherichia coli, the go-to research bacterium, and it was a surprise at the time. Now, I’m looking at the explanation for this necessarily discontinuous process in Wikipedia, and I confess I don’t really understand it, but I’ll give it a go. Apparently the Okazakis ‘suggested that there is no found mechanism that showed continuous replication in the 3′ to 5′ direction, only 5′ to 3′ using DNA polymerase, a replication enzyme’, to quote from Wikipedia. So they were rather cleverly hypothesising that there must be another mechanism for the 3′ to 5′ lagging strand, which must be discontinuous.
Canto: And another way of saying that, is that the process must be fragmentary. And they used experiments to test this hypothesis?
Jacinta: Correct, and I won’t go into the process of testing, as if I could. It involved pulse-labelling. Don’t ask, but it has something to do with radioactivity. Anyway, the test was successful, and was supported by the discovery shortly afterwards of polynucleotide ligase, the enzyme that stitches these fragments together. Now, you want to know more about primase, polymerase, and now ligase no doubt. So here’s a bit of the low-down. DNA primase is, to confuse you, an RNA polymerase, which synthesises RNA from a DNA template. It’s a catalyst in the synthesis of a short RNA segment, known as a primer. It’s extremely important in DNA replication, because no polymerase (and you know how polymerase keeps getting associated with primase) can make anything happen without an RNA (or DNA) primer.
Canto: But why? This is getting so complicated.
Jacinta: I assure you, we’ve barely scratched the surface….
Canto: Well, Socrates was right – there’s an essential wisdom in being aware of how ignorant you are. We’ll battle on in our small way.
a DNA dialogue 5: a first look at DNA replication

Jacinta: So let’s scratch some more of the surface of the subject of DNA and genetics. A useful datum to remember, the human genome consists of more than 3 billion DNA bases. We were talking last time about pyrimidines and purines, and base pairs. Let’s talk now about how DNA unzips.
Canto: Well the base pairs are connected by hydrogen bonds, and the two DNA strands, the backbones of the molecule, run in opposite, or anti-parallel, directions, from the 5′ (five prime) end to the 3′ (three prime) end. So, while one strand runs from 5′ to 3′ (the sense strand), the other runs 3′ to 5′ (the antisense strand).
Jacinta: Right, so what we’re talking about here is DNA replication, which involves breaking those hydrogen bonds, among other things.
Canto: Yes, so that backbone, or double backbone whatever, where the strands run anti-parallel, is a phosphate-sugar construction, and the sugar is deoxyribose, a five-carbon sugar. This sugar is oriented in one strand from 5′ to 3′, that’s to say the 5′ carbon connects to a phosphate group at one end, while the 3′ carbon connects to a phosphate group at the other end, while in the other strand the sugar is oriented in the opposite direction.
Jacinta: Yes, and this is essential for replication. The protein called DNA polymerase should be introduced here, with thanks to Khan Academy. It adds nucleotides to the 3′ end to grow a DNA strand…
Canto: Yes, but I think that’s part of the zipping process rather than the unzipping… it’s all very complicated but we need to keep working on it…
Jacinta: Yes, according to Khan Academy, the first step in this replication is to unwind the tightly wound double helix, which occurs through the action of an enzyme called topoisomerase. We could probably do a heap of posts on each of these enzymes, and then some. Anyway, to over-simplify, topoisomerase acts on the DNA such that the hydrogen bonds between the nitrogenous bases can be broken by another enzyme called helicase.
Canto: And that’s when we get to add nucleotides. So we have the two split strands, one of which is a 3′ strand, now called the leading strand, the other a 5′ strand, called the lagging strand. Don’t ask.
Jacinta: The leading strand is the one you add nucleotides to, creating another strand going in the 5′ to 3′ direction. This apparently requires an RNA primer. Don’t ask. DNA primase provides this RNA primer, and once this has occurred, DNA polymerase can start adding nucleotides to the 3′ end, following the open zipper, so to speak.
Canto: The lagging strand is a bit more complex though, as you apparently can’t add nucleotides in that other direction, the 5′ direction, not with any polymerase no how. So, according to Khan, ‘biology’ adds primers (don’t ask) made up of several RNA nucleotides.
Jacinta: Again, according to Khan, the DNA primase, which works along the single strand, is responsible for adding these primers to the lagging strand so that the polymerase can work ‘backwards’ along that strand, adding nucleotides in the right, 3′, direction. So it’s called the lagging strand because it has to work through this more long, drawn-out process.
Canto: Yes, and apparently, this means that you have all these fragments of DNA, called Okazaki fragments. I’m not sure how that works…
Jacinta: Let’s devote our next post on this subject entirely to Okazaki fragments. That could clarify a lot. Or not.
Canto: Okay, let’s. Goody goody gumdrops. In any case, these fragments can be kind of sewn together using DNA ligase, presumably another miraculous enzyme. And the RNA becomes DNA. Don’t ask. I’m sure all will be revealed with further research and investigation.
References
Leading and lagging strands in DNA replication (Khan Academy video)
Epigenetics 8: some terms

The gene is not more ‘basic’ than the organism, or closer to ‘the essence of life’, whatever that means. Organisms have DNA codes, and they maintain external forms and behaviours. Both are equal and fundamental components of being. DNA does not even build an organism directly, but must work through complex internal environments of embryological development, and external environments of surrounding conditions. We will not know the core and essence of humanity when we complete the human genome project.
Stephen Jay Gould, ‘Magnolias from Moscow’, in Dinosaur in a Haystack, 1996
I remember ages ago promising that I’d start every blog piece with a quote, then I more or less immediately forgot about it. Anyway the above quote kind of refers to epigenetics, and anticipates, in a way, the disappointment that many have felt about the human genome project and its not-quite-revelatory nature. As we learn more about the complexities of epigenetics, more about the relationships between genotype and phenotype will be revealed, but the process will surely be very gradual, though relentless. But I can’t talk, knowing so little. In this post, I’ll look at a very few key terms to help orient myself in this vast field. Not all will be specifically related to epigenetics, but to the whole field of DNA and genetics.
nucleosome: described as ‘the basic structural form of DNA packaging in eukaryotes’, it’s a segment of DNA wound round a histone ‘octamer’, a set of eight histones in a cubical structure. All of this is for fitting DNA into nuclei. Nucleosomes are believed to carry epigenetic info which modifies their core histones, and their positions in the genome are not random. Each nucleosome core particle consists of approximately 146 base pairs.
chromatin: a complex of DNA and protein, which packages DNA protectively, condensing the whole into a tight structure. Histones are essential components of chromatin. Chromatin structure is affected by methylation and acetylation of particular proteins, which in turn affects gene expression.
nucleotides: the basic building blocks of DNA and RNA, they consist of a nucleoside and a phosphate group. A nucleoside itself is a nitrogenous base (also known as a nucleobase) and a five-carbon sugar ribose (a ribose – these explanations always need more explaining – is a simple sugar, the natural form of which is D-ribose, and which comes in various structural forms). DNA and RNA are nucleic acid polymers made up of nucleotide monomers.
nucleobase: a nitrogenous base (e.g. adenine, cytosine, thymine, guanine, and uracil which replaces thymine in RNA), the fundamental units of our genetic code. Also simply known as a base.
base pairs: a base pair, in DNA, is one of the pairings adenine-thymine (A-T) or cytosine-guanine (C-G). They are pyrimidine-purine pairings. Adenine and guanine are purines, the other two pyrimidines. Due to their structure pyrimidines always pair with purines.
CpG islands: regions of DNA with a high frequency of CpG (C-G) sites, i.e. sites where a cytosine nucleotide is followed by a guanine nucleotide in linear sequence in a particular direction.
histones: highly alkaline proteins, the chief proteins of chromatin, and the means of ordering DNA into nucleosomes. There are four core histones, H2A, H2B, H3 and H4. These form an octamer structure, around which approximately 146 base pairs are wound.
Obviously, I’m very much a beginner at comprehending all this stuff, but I note that the number of videos on epigenetics seems to increase almost daily, which is raising my skepticism more than anything. I try to be selective in checking out these videos and other info on the topic, as there’s always this human tendency to claim super-solutions to our problems, as in super-foods and super-fitness regimes and the like. I’m more interested in the how of things, which is always a more complicated matter. Other information sources tend to assume knowledge or to skate over obvious complexities in a facile manner, and then of course there’s the ‘problem’ of being a dilettante, who wants to learn more about areas of scientific and historical knowledge often far removed from each other, and time’s running out, and we keep forgetting…
So anyway, I’ll keep plodding along, because it’s all quite interesting.
epigenetics and imprinting 7: more problems, and ICRs
the only image I can find that I really understand
In the previous post in this series I wrote about the connection between two serious disorders, Angelman syndrome and Prader-Willi syndrome, their connection to a missing small section of chromosome 15, and how they’re related to parental inheritance. These syndromes can sometimes also be traced back to uniparental disomy, in which the section of chromosome 15 is intact, but both copies are inherited from the mother (resulting in PWS) or the father (resulting in AS).
So the key here is that this small section of chromosome 15 needs to be inherited in the correct way because of the imprinting that comes with it. To take it to the genetic level, UBE3A is a gene which is only expressed from the maternal copy of chromosome 15. If that gene is missing in the maternal copy, or if, due to uniparental disomy, both copies of the chromosome are inherited from the father, UBE3A protein won’t be produced and symptoms of Angelman syndrome will appear. Similarly, PWS will develop if a certain imprinted gene or genes aren’t inherited from the father. Other imprinting disorders have been found, for example, one that leads to Beckwith-Wiedemann syndrome, though the mechanism of action is different, in that both copies of a gene on chromosome 11 are switched on when only the paternal copy should be expressed. This results in abnormal growth (too much growth) in the foetus. It too has an ‘opposite’ syndrome, Silver-Russell syndrome, in which the relevant protein expression is reduced, resulting in retarded growth and dwarfism.
But now to the question of exactly how genes are switched on and off, or expressed and repressed. DNA methylation, briefly explained in my first post on this topic, is essential to this. Methyl groups are carbon-hydrogen compounds which can be bound to a gene to switch it off, but here’s where I start to get confused. I’ll quote Carey and try to make sense of it:
… it may be surprising to learn that it is often not the gene body that is methylated. The part of the gene that codes for protein is epigenetically broadly the same when we compare the maternal and paternal copies of the chromosome. It’s the region of the chromosome that controls the expression of the gene that is differently methylated between the two genomes.
N Carey, The epigenetics revolution, 2011 p140
The idea, I now realise, is that there’s a section of the chromosome that controls the part of the gene that codes for the protein and it’s this region that’s differently methylated. Such regions are called imprinting control regions (ICRs). Sometimes this is straightforward, but it can get extremely complicated, with whole clusters of imprinted genes on a stretch of chromosome, being expressed from the maternally or paternally derived chromosomes, and not simply through methylation. An ICR may operate over a large region, creating ‘roadblocks’, keeping different sets of genes apart, and affecting thousands of base-pairs, not always in the same way. Repressed genes may come together in a ‘chromatin knot’, while other, activated genes from the same region form separate bundles.
Imprinting is a feature of brain cells – something which, as of the writing of Carey’s book (2011), is a bit of a mystery. Not so surprising is the number of expressed imprinted genes in the placenta, a place where competing paternal-maternal demands are played out. As to what is going on in the brain, Carey writes this:
Professor Gudrun Moore of University College London has made an intriguing suggestion. She has proposed that the high levels of imprinting in the brain represents a post-natal continuation of the war of the sexes. She has speculated that some brain imprints are an attempt by the paternal genome to promote behaviour in young offspring that will stimulate the mother to continue to drain her own resources, for example by prolonged breastfeeding.
N Carey, The epigenetics revolution, 2011. pp141-2
This sounds pretty amazing, but it’s a new epigenetic world we’re exploring. I’ll explore more of it next time.
References
The epigenetics revolution, by Nessa Carey, 2011
a DNA dialogue 4: purines, mostly

Canto: So what’s a pyrimidine, molecularly speaking, and why does it differ from a purine, and why does it matter?
Jacinta: They’re two different types of nitrogenous bases, dummy, which are a subset, maybe, of nucleotide bases. All of which is largely gobbledygook at present.
Canto: Ok, we know there are four different nitrogenous bases in DNA. Two of them, A & G, adenine and guanine, are purines, which structurally are two-carbon nitrogen ring bases. The other two, thymine and cytosine, T & C, are pyrimidines, which are one-carbon nitrogen ring bases. Uracil, in RNA, is also a pyrimidine. It replaces the thymine used in DNA.
Jacinta: That’s right, now we know that in DNA these nitrogenous bases are connected across the double helix, in pairs, in a particular way. A (a purine) always connects with T (a pyrimidine), and similarly C is always bonded to G. So why is this?
Canto: Why is it so? Well, put simply, the molecular structure of purines, which you’ll note have a two-carbon ring structure and so are larger than pyrimidines, doesn’t allow them to bond within the group, that’s to say with other purines, and the same goes with pyrimidines. It’s essentially due to the difference between hydrogen bond donors and acceptors for these groups.
Jacinta: So, looking at purines first, considering that they’re one of the building blocks of life, it’s not surprising that we find them in lots of the food we eat, especially in meat, mostly in organs like kidneys or liver. Structurally they’re heterocyclic aromatic organic compounds – as are pyrimidines. Heterocyclic simply means they have a ring structure composed of more than one element – in this case carbon and nitrogen. An aromatic compound isn’t quite what you think – structurally it means that it’s strong and stable, due to resonance bonds, which we won’t go into here. Below is a model of a purine molecule, which has the chemical formula C5H4N4 – the black globes are carbon atoms, the nitrogens are blue and the hydrogens white.

Purines and pyrimidines are both self-inhibiting and activating, so they actively bond with each other but inhibit self-bonding, so that they maintain a more or less equal amount as each other within the cell.
Canto: So that’s purines in general, but in DNA there are two purines, adenine and guanine, which must differ structurally – and are there any others?
Jacinta: Oh yes, caffeine is a purine, as well as uric acid…
Canto: Definitely aromatic.
Jacinta: And there are many others. Purines are very important molecules, used throughout the body for a variety of purposes, as components of ATP, cyclic AMP, NADH and coenzyme A, for example.
Canto: I’ve heard of some of those…
Jacinta: As to the difference between adenine and guanine, here’s how it’s described in this Research Gate article, which I’m sure is reliable:
The main difference between adenine and guanine is that adenine contains an amine group on C-6, and an additional double bond between N-1 and C-6 in its pyrimidine ring, whereas guanine contains an amine group on C-2 and a carbonyl group on C-6 in its pyrimidine ring
Canto: Shit, that explanation needs to be explained, please.
Jacinta: Haha well let’s look at more diagrammatic structures, but first – an amine group, also called an amino group, is a derivative of NH3 (ammonia), consisting of a nitrogen atom bonded to hydrogen atoms, at its simplest. This gives adenine the formula C5H5N5. Guanine has, in addition to the amine group, a carbonyl group, which is a carbon double bonded to an oxygen, C=O. This gives guanine the formula C5H5N5O. Anyway, it’ll all become clear over the next dozen or so years…


References
https://www.researchgate.net/publication/316984935_Difference_Between_Adenine_and_Guanine
https://en.wikipedia.org/wiki/Purine
https://en.wikipedia.org/wiki/Adenine
a DNA dialogue 3: two anti-parallel strands

Jacinta: Ok so these two strands of DNA are described as anti-parallel. Is this just intended to confuse us?
Canto: Apparently not, in fact it’s quite essential. The useful q&a site Quora has good info on this, and understanding it in all its complexity should help us to understand DNA general – it’s one of a thousand useful entry points.
Jacinta: Yes, and I’ll try to explain. It became clear to us last time that the strands or ribbons twisted round in a double helix, called the backbone of the molecule, are made from phosphate and deoxyribose sugar, covalently bonded together. That means tightly bonded. Between the two strands, connecting them like ladder rungs, are nitrogenous bases (this is new to us). That’s adenine, thymine, guanine and cytosine, bonded together – A always to T, and C to G – with weak hydrogen bonds. We’ll have to look at why they must be paired in this way later.
Canto: It’s called Chargaff’s base pairing rule, which doesn’t tell us much.
Jacinta: And, according to a respondent from Quora, ‘the two strands of DNA are anti-parallel to each other. One of them is called leading strand, the other is lagging strand’. But I don’t quite get this. How are there two strands of DNA? I thought there was one strand with two sugar-phosphate backbones, and a rung made up of two – nitrogenous nucleobases? – weakly connected by hydrogen bonds.
Canto: I think the idea is there are two strands, with the attached bases, one next to another on the strand, and weakly attached to another base, or set of bases each attached to another phosphate-sugar backbone. As to why the whole thing twists, rather than just being a straight up-and-down ladder thing, I’ve no idea. Clearly we’re a couple of dopey beginners.
Jacinta: Well, many of the Quora respondents have been teaching molecular biology for years or are working in the field, and just skimming through, there’s a lot be learned. For now, being anti-parallel is essential for DNA replication – which makes it essential to DNA’s whole purpose if I can call it that. I’ll also just say that the sugars in the backbone have directionality, so that the way everything is structured, one strand has to go in the opposite direction for the replication to work. If for example the strands were facing in the same direction, then the base on one side would connect to a hydrophobic sugar (a good thing) but the base on the other side would be facing a hydrophilic phosphate (a bad thing). Each base needs to bond with a sugar – that’s to say a carbon atom, sugar being carbon-based – so one strand needs to be an inversion of the other. That’s part of the explanation.
Canto: Yes, I find many of the explanations are more like descriptions – they assume a lot of knowledge. For example one respondent says that the base pairs follow Chargaff’s rule and that means purines always pair up with pyrmidines. Not very helpful, unless maybe you’re rote-learning for a test. It certainly doesn’t explain anti-parallelism.
Jacinta: Well, although we don’t fully understand it yet, it’s a bit clearer. Anti-parallelism is an awkward term because it might imply, to the unwary, something very different from being parallel. The strands are actually parallel but facing in the opposite direction, and when you think about the structure, the reason for that becomes clearer. And imagining those backbone strands facing in the same direction immediately shows you the problem, I think.
Canto: Yes and for more insight into all that, we’ll need to look more closely at pyrmidines and purines and the molecular structure of the backbone, and those bases, and maybe this fellow Chargaff.
References