Who is Responsible for Chemical Attacks in Syria? Guest Blog by Professor Paul McKeigue (Part 2)

Paul McKeigue now applies the method described in Part 1 of his guest blog to the events in Ghouta (2013) and Khan Sheikhoun (2017). Based on extensive research, a false flag hypothesis for each event is spelled out in some detail. Photographic evidence referred to is not included in this blog but is available in the public domain. Readers are advised that these hypotheses involve harrowing and disturbing considerations.

United Nations Security Council Holds Emergency Meeting On Syria

Paul McKeigue

Fake news, false flags and the weight of evidence favouring alternative explanations of alleged chemical attacks in Syria

In the last post I outlined the development of the mathematical and philosophical basis for using probability calculus to evaluate evidence.

The framework of probability calculus implies that:-

  • you cannot evaluate the evidence for or against a single hypothesis, only the weight of evidence favouring one hypothesis over an alternative
  • the weight of evidence favouring one hypothesis over another is based on comparing how well each of the two hypotheses would have predicted the observations
  • your assessment of how well a hypothesis would have predicted the observations does not in general depend on your prior degree of belief that this hypothesis is true

That’s as far as we can go without using numbers. As the objective of these posts is to show you how to evaluate evidence for yourself using simple back-of-the envelope calculations, I’ll recapitulate how to do this.

  • How well a hypothesis would have predicted the observations is quantified by a number called the likelihood. This is calculated as the probability of the observations given that hypothesis. When the observations are fixed and we are comparing different hypotheses, we reverse this dependency and describe this number as “the likelihood of the hypothesis given the observations”. If you find this confusing, you’re not the only one. Likelihoods are not probabilities when they are used to compare hypotheses. “Support” would be a better word than “likelihood” (which in ordinary English is synonymous with probability).
  • The weight of evidence favouring one hypothesis over another is the logarithm of the ratio of the likelihoods. Weights of evidence can be added over independent observations. It’s convenient to use logarithms to base 2, so that the weights are expressed in bits.
  • If you make an assertion about the strength of evidence favouring one hypothesis over another, you are making an assertion about the conditional probabilities from which the ratio of likelihoods is calculated. These conditional probabilities (“expectations” would be a better word than “probabilities”) are based on subjective judgements. You can’t evaluate evidence without making these subjective judgements.

If you have learned to think of probabilities as objective properties of physical systems, the modern subjectivist interpretation of probability as quantifying degree of belief may be hard to accept. Classical probability theory was based on situations like coin-tossing and throwing dice, where probabilities are imposed by physical symmetries. However the rules for updating subjective probabilities in the light of evidence apply even when there are no such symmetries. One way to elicit your subjective probability of an event is as the price you would offer or accept for a ticket that will pay out £1 if the event occurs, and nothing if the event does not occur. If the prices you specify over various combinations of events are not consistent with probability theory, someone else can construct a “Dutch book” against you – a set of bets that guarantees that they will gain and you will lose.

Although probabilities are subjective, they are not plucked out of nowhere: they have to be consistent with what you already know, and with what you do not know. Usually there is some information that can be used to set conditional probabilities. For instance, in the examples discussed later, where some victims were injured after they were supposedly rescued, information on the frequencies of accidental injuries of these types in different settings can help us to specify the conditional probability of this observation given a hypothesis under which such injuries could only be explained as accidental. Where people differ in their assessment of the strength of evidence contributed by the same observations, eliciting these conditional probabilities will establish where their judgements differ.

We don’t need to get too hung up on the subjectivity of assessing how well a hypothesis would have predicted the observations. In the examples discussed in these posts, serious errors have arisen not because people’s assessments of these conditional probabilities were inconsistent with the available information, but because:-

  • relevant observations have been widely ignored (as we shall see in this section)
  • observations consistent with a hypothesis have been accepted as evidence supporting that hypothesis, without considering alternative hypotheses. An example is how the observation of Volcano rockets in the Ghouta incident was accepted as supporting the hypothesis of a regime attack, though the hypothesis of a “false flag” attack would have predicted this observation at least as well.
  • the evaluation of evidence favouring one hypothesis over another has been been confused with assertions of prior belief about the plausibility of one of those hypotheses.

In the context of the Syrian conflict, it is difficult for independent-minded journalists and academics to propose explanations that differ from the official line without being heavily criticized for “speculating” or “conspiracy theorizing”. However you cannot evaluate evidence for a hypothesis without specifying alternative hypotheses and computing the likelihoods of these hypotheses given the observations: this inevitably requires you to “speculate”.

In the discussion below I have linked to the sources of the observations used, but I have not embedded any images as the horrifying nature of some of these images would distract from the formalism of the argument. I am not appealing to your emotions but to your ability to use logic to evaluate evidence for yourself.

Weight of evidence for alternative hypotheses about the alleged chemical attack in Ghouta in 2013

At the end of the last post I listed four alternative hypotheses about the Ghouta event:-

  • H1: a chemical attack was carried out by the Syrian military, authorized by the government
  • H2: a false-flag chemical attack was carried out by the Syrian opposition to implicate the government
  • H3: an unauthorized chemical attack was carried out by a rogue element in the Syrian military
  • H4: there was no chemical attack but a managed massacre of captives, with rockets and sarin used to create a trail of forensic evidence that would implicate the Syrian government in a chemical attack.

The problem is to compute the likelihoods of these four hypotheses given the “dog did not bark” observation that no images of search and rescue operations were released:-

Under H1, H2 or H3, it is unlikely that no such images would have been made available. But how unlikely? To assess this, we have to envisage the scenario under H1. Procedures for urban search and rescue are well established. After each home has been searched, it is marked to record how many live victims were rescued and how many dead victims were found. If in eastern Ghouta the area affected covered only one square kilometre of housing, with 50 homes per hectare, there would have been 5000 homes to search. With at least 400 fatalities, we expect at least as many living but incapacitated individuals to have needed rescuing. The immediate priority would have been to rescue the living, leaving bodies to be removed later. Even if the operation began in the middle of the night soon after the alleged attack, we would expect it to have continued after daybreak.

Of at least 150 videos uploaded , badged as coming from 18 different media outlets, not one shows this search and rescue operation. Most show victims at morgues, hospitals or improvised medical stations: dead and living victims appear to have arrived at these medical stations in the middle of the night. Most people will agree that the probability of this observation given H1 is low: I’ll assign a value of 0.05. It is possible to calculate a number for this conditional probability based on some assumptions about the probability that the output of a single media outlet will include a search and rescue image, but I don’t claim that this is more than a (rather conservative) subjective judgement.

Under H4, the conditional probability of this observation is high – it would have been difficult to stage such operations without the cooperation of large numbers of civilians. We can set a value of 0.8 for the probability that no search and rescue operations would be uploaded. This gives a likelihood ratio of about 20: a weight of evidence of 4.3 bits favouring H4 over H1. There are other related observations that should be taken into account: for instance:

  • the observation that all victims were in day clothes though the alleged attack occurred at about 2 am
  • the obviously fraudulent videos of the “Zamalka Ghost House” in which videos of a group of adults and children apparently executed several days before the alleged chemical attack and placed in an unfinished building were presented as a family of victims found in situ.

Each of these observations would add one or two bits to the weight of evidence favouring H4 over H1, giving a total weight of evidence of about 7 bits given the related observations of no images of search and rescue, that the only such images uploaded were fraudulent, that bodies of victims apparently arrived without delay, and that these victims were fully clothed. Even if your prior odds favouring H1 over H4 were 1000 to 1, this weight of evidence would reduce the odds to about 3 to 1. There are other “dog did not bark” observations of non-occurrence of expected events related to the Ghouta incident that support H4 over H1: for instance under H1 we would expect to see interviews with bereaved survivors who would be able to document, with family photos, that they were relatives of victims seen dead in morgues.

We could proceed to evaluate the weights of evidence for other independent observations that have been made on the Ghouta event, and add them up. However there is one single observation for which I assess the weight of evidence favouring H4 over H1 to be so large as to overwhelm anything else.

The Kafr Batna morgue images

Some of the most harrowing images from the Ghouta incident were from a building identified as an old tuberculosis hospital in the suburb of Kafr Batna, in which living and dead victims were shown in a basement room, and and at least 80 dead victims were laid out in a sunlit ground floor room (the “Sun Morgue”). A detailed study of the videos and still images from this site has been released online. This includes a detailed reconstruction of the fate of one victim in the Sun Morgue (pages 184-201). A short video summarizing this reconstruction has been released. The sequence of the videos and still images can be reconstructed from sun angles and from the order in which bodies are laid out and removed. The reconstruction shows that a heavily built male (given the code M-015 in this study) was brought into the morgue and laid on the floor apparently dead with no sign of bleeding. In later images M-015 had clenched his fists to grip his shirt, was bleeding from the neck, and a folded blanket had been been placed under his head. In subsequent images the flow of bright red blood had continued, eventually saturating the blanket and spilling on the floor. At the end, when most of the bodies had been removed, the blood-soaked blanket remained. These images show that M-015 was not dead when brought into the morgue (dead people do not clench fists or bleed profusely). The only plausible interpretation of this image sequence is that M-015’s throat was cut when the morgue workers realized he was still alive.

I’ll now try to compute the likelihoods of H4 and H1 given this observation. Under H1 it is possible that a victim would be mistakenly declared dead and begin stirring in the morgue, but it’s almost impossible to explain why subsequent videos showed the victim bleeding bright red blood from the neck, or why the reaction of the emergency workers to someone who was obviously alive and bleeding profusely was to place a blanket under his neck and leave him to die.

The least implausible explanation I can come up with under H1 is that M-015 began stirring in the morgue, that somehow this led to an accident in which he was stabbed in the neck, and that the morgue staff, having no idea how to deal with this and afraid to report the accident, simply placed a blanket under his neck and left him to die. The probability of a patient being accidentally stabbed in the neck in a hospital setting, even in a chaotic response to a major incident, is extremely low. On the basis that I found no reports of such accidents in a brief search, I’ll put the risk of at least one such accident in Ghouta at less than 10-5. A botched medical procedure, such as an attempted insertion of a central venous catheter via the neck, is not a plausible explanation for the bleeding as there would have been no indication for such a procedure as the first response to someone apparently recovering consciousness, and no space around the patient was cleared to facilitate medical intervention. Based on a probability of 10-5 that a victim waking up in the morgue would be accidentally stabbed in the neck and a probability of 0.01 (given that under hypothesis H1 the morgue staff are genuine emergency workers) that the reaction of the staff to this accident would be to leave him to die, I compute the likelihood of H1 given this observation as 10-7. Maybe readers can come up with an better explanation of how this sequence of images could have occurred given hypothesis H1.

Under H4, which postulates that the Ghouta victims were massacred captives most likely killed in gas chambers and that the morgue staff were playing an active part in this operation, such an observation is not unexpected. The probability that in a massacre of more than 400 people at least one victim would survive the gas chamber and that those removing the bodies would fail to detect this is high (0.5). It is to be expected that they would kill such an individual as soon as he began stirring, and it is probable that the method chosen would be throat cutting rather than shooting or strangling (0.5). It’s also probable (0.4) that such an incriminating sequence of images would not be detected by those responsible for editing and uploading the videos and stills. Multiplying these conditional probabilities together gives a likelihood of 0.1. As we’ll see it won’t make much difference to the weight of evidence if these numbers vary by a factor of 2 or so. The weight of evidence is dominated by the very low likelihood of H1.

On this basis I evaluate the likelihood ratio favouring H4 over H1,given the observation of what appears to be a murder in the Kafr Batna morgue, as a million to one: a weight of evidence of 20 bits.

Evidence for alternative explanations of Khan Sheikhoun

We now turn to evaluating the evidence for alternative explanations of the alleged chemical attack on 4 April 2017 in Khan Sheikhoun. With Ghouta as a precedent, we can begin by defining just two alternative hypotheses:

  • H1: the Khan Sheikhoun incident was a chemical attack by the Syrian air force using sarin. The leading proponents of this hypothesis are the US, UK and French governments.
  • H2: the Khan Sheikhoun incident was a planned deception operation intended to bring about US military intervention, in which captives were killed in gas chambers, small quantities of sarin were used to generate a forensic trail and a large-scale media operation was undertaken to support the story of a chemical attack by the Syrian air force. The earliest proponents of this hypothesis were a group of contributors to the wiki A Closer Look on Syria. Under this hypothesis, Khan Sheikhoun is Ghouta version 2, and it is to be expected that a similar trail of evidence will be laid: purported eyewitnesses will describe the attack, videos will show victims purportedly being treated and bodies laid out in morgues, at least one alleged impact site will be shown with the remains of a munition, and both environmental and physiological samples will test positive for sarin.

As before, your prior beliefs about which of these two alternative hypotheses is correct need not prevent you from evaluating the weight of evidence favouring one hypothesis over the other. You may believe that H1 is highly implausible on the basis that the Syrian government had no motive to carry out such an attack, or you may believe that H2 is highly implausible on the basis that such an elaborate deception operation is beyond the capability of the Syrian opposition and their foreign allies. To evaluate the evidence favouring H1 over H2, you have to assess, for each hypothesis in turn, what you would expect to observe if that hypothesis were true. This means that you have to put yourself first in the shoes of a Syrian general planning a chemical attack on the town, and then in the shoes of an opposition commander planning a massacre that would implicate the Syrian government.

For hypothesis H2 we have to envisage how a clever and ruthless al-Qaeda commander, perhaps working with foreign help, would plan such an operation. Although it is disturbing to have to work through this, I’ll now state, as neutrally as I can, how I would expect such an operation to be planned.

  • Captives (most likely religious minorities or families of government supporters) would be held in readiness. Improvised explosive devices and possibly smoke generators could be placed at key locations in the town to panic the civilian population into believing they were under chemical attack. Low doses of sarin could be administered to volunteers so that they would test positive for exposure to sarin (the doses required to generate a positive test are far below those required to cause symptoms). Medical facilities controlled by jihadis would be ready to play their part by showing casualties, real or fake, being “treated”. A few actors could be prepared to play the part of bereaved parents, and provided with photos of children who were to be killed. Captives would be killed in improvised gas chambers, but the preferred agent would be an easily-available gas that leaves no residue, rather than sarin which would endanger those removing the bodies. A well-staffed video editing operation would be ready to edit the raw footage into clips and stills badged with the logos of various opposition media organizations. To make the video images so horrific that those viewing them would be shocked into supporting immediate retaliation against the Syrian government, the planners might choose that some children would not be killed outright by the gas but instead filmed struggling to breathe, before they were finished off by other methods.

In this framework, we can begin by evaluating the “mountain” of evidence – eyewitness reports, footage, crater, positive tests for sarin – that Monbiot invoked. Most of these observations were similar to that from Ghouta: purported eyewitnesses of the attack were made available for interview, images showing victims in morgues or improvised treatment facilities were uploaded, and samples tested positive for sarin. The likelihood of H1 given these observations is close to 1. Under H2, which specifies that Khan Sheikhoun was a repeat of Ghouta, we expect such observations so the likelihood of H2 also is close to 1. The weight of evidence favouring H1 over H2 given these observations is therefore close to zero. You may have a strong prior belief that H2 is implausible, but that does not influence the likelihood ratio favouring H1 over H2.

How does hypothesis H2 account for this mountain of evidence so easily? A key requirement for a successful deception operation is to create what look like many independent sources of evidence, even though they are all in fact generated by the operation. This principle was brilliantly applied by the legendary Naval Intelligence Division in the deception operations that led German commanders to expect Allied landings in 1943 in Greece rather than Sicily, and in 1944 in Calais rather than Normandy. Thus under H2, if the planners are competent, we expect to see videos badged with the logos of different opposition media agencies and uploaded separately, even though they may all originate from a central video editing operation.

At this point you may reasonably ask: if H2 can so easily account for this mountain of evidence, what possible observations could give a likelihood ratio strongly favouring H1 over H2? Such observations are those that would be expected under H1, but very difficult to generate under H2. For instance if H1 were true, any of the following observations might be expected to contribute evidence favouring H1 over H2:-

  1. if we were presented with convincing and hard-to-fake evidence that the victims seen dead in the images had lived in the locality from which they were supposedly rescued
  2. if interviews with bereaved survivors included convincing and hard-to-fake evidence that the dead victims were their relatives, including family photos showing them with these victims. These family photos should include adult victims, who unlike young children cannot easily be induced to pose in a familiar setting with their captors.
  3. if videos showed the search and rescue operations in which these victims’ bodies were recovered: these operations would be hard to stage on a large scale without the cooperation of civilians.
  4. if a chemical signature match between the environmental sarin samples and Syrian military stocks were reported by scientists prepared to put their names on a report that was detailed enough to be subjected to independent peer review.
  5. if blood tests on purported survivors of the chemical attack showed exposure to sarin at levels high enough to have caused severe and life-threatening poisoning. Modern tests for sarin exposure can detect exposure at levels far lower than those required to cause symptoms. It would be easy for actors to expose themselves to low doses of sarin, but not so easy for them to expose themselves at levels high enough to cause severe symptoms.

It’s also useful to list, before going further, what possible observations might be expected to contribute evidence favouring H2 over H1 if H2 were true:-

  1. if the locations of victims and alleged air strikes were not consistent with records of flight tracks or with wind directions. Under H2, locations of improvised explosive devices would have to be planned in advance, without knowing where a jet would fly or which way the wind would be blowing.
  2. if the uploaded videos contained evidence that scenes were staged or that the victims were captives. Under H2, a weak point in the operation is that dozens of video clips and still images that are meant to show rescue workers dealing with large numbers of victims have to be recorded, edited and uploaded in a few hours, and the editing may fail to remove incriminating material. When all available images are arranged in temporal sequence, using sun angles and other clues to time the images, and the identities of victims are matched in different clips a different story may be revealed, as in Kafr Batna.

With this in mind, we can evaluate the weight of evidence contributed by five observations that have been summarized here

Weights of evidence contributed by observations

Observation Prob (obs given H1) Prob (obs given H2) Likelihood ratio H2 / H1 Weight of evidence (bits) favouring H2 over H1
An individual claiming to be a bereaved survivor was made available for interview, with photos showing him with two children later seen as victims. The lack of photos of his wife was attributed to loss of the family photo album in an airstrike on the family home. 0.002 0.04 20 4.3
There are no videos of victims being rescued in their homes, or bodies being recovered 0.05 0.8 16 4
The flight track of the Syrian jet shown by the Pentagon (single east-west pass just south of the town) is incompatible with the track of the three explosions (north-south axis over the northern part of town) and the alleged impact site of the chemical munition 0.01 0.8 80 6.3
The alleged impact site of the chemical munition is upwind of where the casualties were reported (by the rebels) to have occurred. 0.02 0.5 25 4.6
In the images released by the rebels, several of the children who are seen dead have head and neck injuries. Reconstruction of sequences and matching of identities shows that in two of these children the head injuries were received after they had been supposedly rescued by the White Helmets 0.01 0.2 20 4.3
Total 23.5

Notes on assignment of likelihoods

  1. In Khan Sheikhoun at least two individuals claiming to be bereaved survivors were interviewed. Most of the interviews were given by Abdelhamid al-Yousef (AHY), who appears to have been serving in the opposition forces as a sniper. AHY reported that his wife and nine-month old twins had been killed in the chemical attack, and produced photos showing him with two children about this age who were among the dead victims. No photos showing AHY with the mother of these children were produced: an interviewer reported that “he does not even have any photos of his beloved wife of two years left to console him, as they were all destroyed in the attack that ripped through his hometown.” and quoted him as saying “In my house all the photos I had of my wife and everything I owned was burnt.”Under H1, it is expected that at least one bereaved survivor would be available for interview. However the probability is rather low that the witness’s home would have been destroyed in an air strike at the same time as the alleged chemical attack, given that only three explosions were documented as occurring in Khan Sheikhoun at this time. These explosions were geolocated by smoke plumes, satellite images and ground-based images. The explosions appear to have been relatively small, each destroying only a single house. If, as alleged, these explosions were caused by bombs dropped by an aircraft in a single pass over the northern half of town, we can estimate the area at risk as about 30 hectares, and that about 1500 homes were at risk (based on a typical urban density of 50 homes/hectare). The probability that the witness’s home would have been one of the buildings destroyed by these three explosions is therefore about 1 in 500.Under hypothesis H2 that Khan Sheikhoun was version two of Ghouta, there is a moderate probability that at least one actor would have been prepared to play the part of a bereaved survivor, and would have posed for photographs with captive children. I’ll assign a probability of 0.2 to this. The problem for such an actor would be to explain the lack of photographs showing him with the adult victims from the same family, It is much easier to get young children to play happily with an adult who befriends them than it is to induce adults to pose for a family photograph with their captors. Of the possible explanations that such an actor might choose to give, one of the most likely (to emphasize the brutality of the regime) is that the family home was destroyed in an airstrike. I’ll assign a probability of 0.2 that this explanation would be produced. Multiplying the conditional probability under H2 that an actor with photos showing him with the children would be made available for interview by the probability that this actor would invoke destruction of the family photo album in an airstrike to explain the lack of photos showing him with the mother, we get a likelihood of 0.04.The likelihood ratio favouring H2 over H1 is 20. Note that this assessment of likelihoods does not make any assessment of whether AHY is telling the truth or lying. We have shown that under H1, it is a rather improbable coincidence that one of the few homes destroyed by three apparently untargeted bombs dropped on a town of at least 20,000 people would be that of the sole survivor of a large extended family killed in a chemical attack at the same time. We also assess that under H2, it is quite probable that an actor playing the part of a bereaved survivor would report the destruction of his home in an airstrike as an explanation for why no family photos showing him with adult victims were available. Computing the ratio of these two likelihoods allows us to make a statement about the strength of the evidence contributed by this observation.
  2. In all the videos and images released by the White Helmets and other opposition media organizations from Khan Sheikhoun, there are no images of urban search and rescue operations. Under H1, we’d expect to see videos of the White Helmets carrying out a search and rescue operation covering the neighbourhood allegedly affected by the chemical attack. The White Helmets are trained in urban search and rescue procedures and are famous for documenting their operations on video. The absence of such videos has low probability (conservatively assessed at 0.05) under H1, but high probability (0.8) under H2 as it would be difficult to stage such scenes without involving large numbers of civilians.
  3. The flight track of the Syrian jet shown at the Pentagon’s press conference shows only a single east-west pass just south of the town, passing no closer than 2 km from the crater that was the alleged impact site of the chemical munition. The three high explosive detonations, mapped by OPCW based on witness reports, and by others based on geolocation of smoke plumes and images (satellite and ground-based) of explosive damage, are in the northern half of town in a north-south line. From the scatter of the points that were plotted on the Pentagon’s map, we can estimate the accuracy of the flight track (presumably based on airborne radar). By inspection of other east-west passes on this map, I estimate that the standard deviation of the errors in a north-south direction is less than 1 km. For the jet to have passed over the alleged impact site, at least two data points would have had to have been plotted too far south by at least two standard deviations: the probability of this is about 1 in 1000. Even more unexpected under H1 is that the flight track does not show the north-south pass that would have been required to drop three bombs corresponding to the three documented high-explosive detonations.As the Pentagon’s map appears to include at least one false-positive data point (an outlying data point southwest of Homs city that does not appear to be part of a flight track), it is reasonable to allocate a small but nonzero probability to false-negative results: specifically a failure to detect a north-south pass. To be conservative, I’ll assign a value of 0.01 to the probability under H1 that the Pentagon’s map of the track of the Syrian jet would match neither the position nor the alignment of the reported impact sites.Under H2, the explosions were generated by IEDs, and the arrival of the jet was the cue to set off these explosions. The probability that the pre-planned line-up of IEDS would not match the flight path of the jet is high – I assign a probability of 0.8 to this.
  4. The videos of the smoke plumes from the three high explosive detonations, recorded by opposition cameramen and said to have occurred just before the alleged chemical attack, show that the wind was blowing steadily from southwest to northweast. The OPCW’s map of the area in which casualties allegedly occurred, based on reports from eyewitnesses, shows that this area is southwest – i.e. upwind – of the alleged impact crater. Under H1, this is difficult to explain: we have to postulate some unusual local reversal of wind direction at ground level. I assign a probability of 0.02 to this. Under H2, in which the locations from which casualties were to be reported and the location of the impact crater were planned in advance, the probabilities that the specified casualty location would be upwind or downwind of the impact crater are about equal, so the probability of an upwind location is about 0.5.
  5. The images of the victims are so horrific that most of us find it difficult to look at them further. Detailed frame-by-frame analyses of the many videos clips and still images can take many months. A few citizen journalists in different countries, sharing their work for peer review, have made some progress with this Careful examination of the videos and still images, using sun angles to time them, has allowed them to be ordered in temporal sequence and the identities of the same individuals to be matched in different videos. Several of the children seen dead in in improvised morgues have obvious and recent head injuries. In at least two of these children, it is possible to establish that these head injuries were received after they had been “rescued” by the White Helmets. Under H1, the probability that at least two victims would receive traumatic injuries after they had been rescued is very low. The most plausible explanation under H1 is a traffic accident while they were being transported in an ambulance or a pickup truck. A rough estimate for the rate of serious injuries from road traffic accidents in a low-income country like Syria in wartime is 1 per million vehicle-kilometres. Allowing for a tenfold higher rate per vehicle-km in vehicles used as emergency ambulances, and a total distance of 200 vehicle-km travelled by vehicles transporting casualties in the Khan Sheikhoun incident, the probability of an accident causing injuries to some of these casualties is about 0.002. Note that this is the risk of a single accident that is assumed to account for all injuries received after rescue; if the injured children did not travel in the same ambulance, we have to postulate multiple accidents, for which the probability is far lower. Again to be conservative I’ll assign a conditional probability of 0.01 to these injuries occurring by accident under H1.Under H2, it is probable that some victims would survive the gas, either by accident or by design (if the plan was to film some children while still alive for maximal emotional impact). These victims would have to be finished off with physical violence, and the probability is high that this would include blows to the head or neck.
    The probablity that editing of the videos would fail to remove the incriminating sequence of images is also moderately high, given the large number of videos that had to be edited and uploaded over a few hours. I assign a probability of 0.2 to this observation given H2.

From this evaluation, I assess the total weight of evidence favouring H2 over H1 as about 23 bits, giving a likelihood ratio of about 8 million to 1. This might be described as a mountain of evidence.

Although these assignments of the conditional probablities of the observations given H1 and H2 entail subjective judgements on my part, it should be possible for people with different prior odds to reach consensus on these conditional probabilities, and thus on the likelihood ratios. You may be able to improve on and correct my judgements of the conditional probabilities, using additional information. For instance:-

  • By fitting smoothed curves to the points shown on the Pentagon’s map of the flight track, it should be possible to make a better estimate of the probability distribution of the errors in the data points that make up the flight track.
  • Someone with meteorological expertise may be able to assign a more realistic probability of a local reversal of wind direction at ground level.
  • Further analysis of the videos may establish whether a single traffic accident to an ambulance can account for all children who were injured after they had been rescued.


From this evaluation of the likelihoods of alternative explanations of the alleged chemical attacks in Ghouta in 2013 and Khan Sheikhoun in 2017 given some key observations, I assess that the evidence favouring the hypothesis of a managed massacre of captives over the hypothesis of a regime chemical attack is overwhelming (at least 20 bits) both for the Ghouta attack and for Khan Sheikhoun. The calculations and subjective judgements on which these assessments are based are set out above. The evaluation of weights of evidence does not depend on prior beliefs about which hypotheses are plausible. To modify this conclusion about the weight of evidence, you have either to identify additional observations which would have been predicted better by the regime chemical attack hypothesis than by the managed massacre hypothesis, or to criticize and revise my assessments of the conditional probabilities of the observations listed above given each of these two hypotheses. I’ve suggested above some ways in which additional information could be used to revise these conditional probabilities. If you believe that either the managed massacre hypothesis or the regime attack hypothesis is implausible, I am not disagreeing with you: priors are subjective. However for your beliefs to be logically consistent, your priors must be updated by the weight of evidence according to Bayes’ theorem.

The strength of the evidence favouring the managed massacre hypothesis over the regime chemical attack hypothesis has quite radical implications for the credibility of western media, western governments and international agencies such as OPCW; you may reasonably ask “how could they have got it so wrong?”.



This entry was posted in chemical weapons, disinformation, guest blog, journalism, propaganda, Syria, UK Government, Uncategorized, war, White Helmets and tagged , , , , , , , , . Bookmark the permalink.

17 Responses to Who is Responsible for Chemical Attacks in Syria? Guest Blog by Professor Paul McKeigue (Part 2)

  1. Loverat says:

    Very interesting article from a different perspective. In my work we are trained to weigh things up on the blance of probabilities. Two narratives which you would explore and certainly if one side fired missiles at the other this would be pre-judging. Trump has fallen into the group think on Syria. This is so obvious..

    My own particular research falls totallly in line with what I read here. We are all being misinformed on the level Germany was in the 1930s.

    Quite shocking that there are few journalists not realising the damage they are causing to the peope of Syria – and the stability of the world.

  2. The population density in Zamalka pre-war was around 300 people per hectare. The density could vary significantly depending on whether people moved into the area – say from Jobar – or moved out due to proximity of the front line in Jobar.
    I ran computer models on the release of varying quantities of sarin under prevailing weather conditions – which varied during the night. The lethal part of the plumes was hundreds of metres length and tens of metres width assuming the larger quantity of gas possible per missile. Significant effects perhaps up to 800m length ?
    Plume lethality is a function of cumulative time and concentration with high sarin concentration short time more lethal than low concentration long time for the same cumulative exposure.
    Non lethal effects especially miosis would have been obvious kilometres downwind from each release point. Depending on population density that could include many thousands of people.
    The last point is important for Moadimiyah as the wind at dawn was blowing into Mezzeh suburb which is home to many foreign diplomats; minor effects would have been seen, however none was reported, leading to the suspicion no sarin was actually released in Moadimiyah.
    My analysis used EPA model SLAB which is optimised for denser gases. It generates data that can be post processed to approximate cumulative exposure in a geographical area.
    My overall feeling is that if the missiles were fully loaded and the alleged number is correct, the number of casualties and deaths would have been far larger than reported

    • Paul McKeigue says:

      In the post I tried to estimate the scale of the search and rescue operation that would have been required if there had been a real chemical attack in east Ghouta. I think my estimate of at least 5000 homes to be searched is consistent with your estimates of population density and area affected (assuming average household size of 5 or 6, 300 people/hectare, and 10-12 rockets causing casualties over a total area of one square kilometre). Given the hypothesis of an attack on this scale, it is improbable that the rescue of living victims and the removal of bodies could have been completed so quickly, and that no videos of this operation were recorded. The OPCW Fact-Finding Mission made no attempt to investigate this, or to map the area allegedly affected by plotting the locations of casualties – the first step in any epidemiological investigation of such an incident.

  3. Mark says:

    Thanks for describing Bayes Theorem and then applying it to Ghouta and Khan Shaykhun. I do have a couple of criticisms.

    Yes, priors are subjective, but their probability needs to be stated and explained why they were chosen. They are just like likelihood ratios. You can’t get a posterior probability without a prior probability. This is very important is these cases because the two faked sarin attack and gas chamber massacre hypotheses are very unlikely. A reference class of attacks that are claimed to be false flags seems the best I can think of. And I’m not aware of any examples of any false flag attacks that used a different method of killing hundred(s) of victims in a different way than is publicly presented. Using gas chambers to surreptitiously kill hundred(s) of victims makes the prior probability even lower.

    Both examples include only 4 or 5 observations and they all favor the gas chamber massacre hypotheses, opening up the criticism of cherry picking evidence. There is certainly evidence that favors the other hypotheses. As much evidence as possible should be used.

    You have to be very careful about using arguments from silence. The dog not barking is a clear cut example of a valid argument from silence. We know a guard dog will bark at a stranger and anyone nearby would hear it. Not finding video-documented search and rescue operations or victims with pictures are not so clear cut. How sure are we that these would happen and we would find them?

    This is a minor point, but not all victims in Ghouta were wearing day clothes. I’ve seen dozens of men in wife-beaters, I don’t think those are considered day clothes. There are also clearly children in pajamas. And most women victims are covered, so we don’t know what they were wearing.

    Finally, a question. Can the log and bits method be used when you have more than two hypotheses? It is necessary to calculate the likelihood of each piece of evidence for all hypotheses.

    • Adrian D. says:

      @Mark. You’re absolutely right that we have to be careful of arguments from silence in general, but I suggest we are now well past the need for that in Syria and Khan Sheikhoun in particular.

      We can be more than usually confident, if not absolutely certain, that video recording would exist given that they are relatively common place for the White Helmets, that the incentives to capture such harrowing footage were huge, that a number of White Helmets said they had time to return to base for equipment and still return to the scene, that HRW says that they have seen (but have not released) videos of the gas of the alleged first attack, that at least one White Helmet claims to have been photographing as he collapsed and that two local ‘journalists’ who videoed the conventional bombs and then allegedly attended the scene inexplicably did not attempt to capture any images of the carnage supposedly developing in front of them.

      FWIW, I think it’s very likely that the number of victims has been grossly exaggerated in both the Ghouta and KS incident.

      • Mark says:

        I agree that KS is much more likely to have rescue ops filmed. I don’t think the White Helmets were in Ghouta at the time.

        I think close to 100 deaths in KS is likely correct, while I would guess 300 for Ghouta.

  4. Paul McKeigue says:

    Bayes theorem can be used to compare several hypotheses (as in the examples on the Rootclaim site), but you’ll get the same answer as to which hypothesis is best supported by the observations if you just choose one hypothesis as baseline, and compare other hypotheses one at a time with this baseline hypothesis.

    A key message of these two articles is that judgements of prior probabilities need not and should not prevent us from evaluating the weights of evidence favouring one hypothesis over another. If you have prior odds of a million to one favouring one hypothesis over another and you’re therefore not impressed by a likelihood ratio of 1000 to 1 in the other direction, I’m not arguing with you. For others whose minds are less firmly made up, this strength of evidence may be enough to change what they believe to be the most probable explanation

    It’s an interesting problem to think about how one might proceed to set a prior probability for the managed massacre hypothesis that is consistent with what was known at the time. One possible approach would be to tabulate earlier reports in which Syrian rebels in possession of the bodies of massacred civilians had blamed the regime, and to estimate the proportion of these massacres in which the rebels were the perpetrators. Another possible approach would be to examine earlier reports of massacres that led to western intervention in a civil war (Yugoslavia, Libya) and to estimate the proportion of these massacres in which the perpetrators were those that stood to gain from the intervention.

    I don’t think the argument that no previous false flag massacre has entailed using gas to kill victims is relevant to setting the prior probability of a managed massacre. Given that the objective of such a massacre would have been to show that the “red line” of a chemical attack had been crossed, the victims had to be killed by some method that did not leave obvious signs of external injury, and the scale of the massacre had to be large enough that it could not plausibly be attributed to a chemical attack by an opposition group with access to small quantities of sarin.

    Would you like to list some more observations from either Ghouta or Khan Sheikhoun that should be included? We can then try evaluating their contributions to the total weight of evidence favouring the managed massacre over the regime attack hypothesis? For an observation to contribute evidence, that observation has to be more probable under one hypothesis than it is under the other. So for instance eyewitness reports of an aircraft dropping something (expected under both hypotheses) do not contribute evidence.

    • Mark says:

      Yes, prior probability doesn’t affect evaluating evidence for or against a hypothesis, but it still needs to be dealt with. I think you should argue with other people’s prior probabilities where you don’t agree. Otherwise two people can disagree about the most likely hypothesis without knowing where their disagreement comes from. So what are your prior probabilities for the hypotheses and why?

      It is remarkably hard to construct a prior probability for such a unique event. Unfortunately, it’s not so unique due to it being a massacre of civilians, or even being a potential false flag. The most unique parts are capturing people unharmed then killing them in gas chambers and planting evidence of a chemical attack. Yes, they needed to make it look like a chemical attack to cross the red line. But that does not make it more likely that they could actually pull it off.

      It is difficult to lay out evidence against H4 and H2 because they are formulated to cover the evidence we see. The other hypotheses don’t lay out specifics about the attack, just who is guilty. H4 and H2 state who is guilty, a secret method of killing the victims, and creating false evidence to blame the Syrian government. Adding specificity to hypotheses give them a lower prior probability. I would argue these two additions make them very much less likely. This is why stating prior probability is so important.

      For instance, there is evidence of victims testing positive for exposure to Sarin, around 20 people in each case. If there was an actual Sarin attack, we would expect people to be exposed to Sarin. But we would also expect it if evidence was planted. What is different between these two hypotheses is their prior probability.

      • causticlogic says:

        “It is remarkably hard to construct a prior probability for such a unique event.”
        Evidently not for you, as you call it “unique.” It’s actually quite familiar to some of us who’ve looked into the details repeatedly over time.

        “The most unique parts are capturing people unharmed then killing them in gas chambers and planting evidence of a chemical attack. Yes, they needed to make it look like a chemical attack to cross the red line. But that does not make it more likely that they could actually pull it off.”
        At least we’re not stuck on an imagined morality barrier. Your question is could they do it. Here’s how they capture people unharmed, with plenty of parallels (see for example, HRW report “You can still see their blood” 2013): They overrun an area, killing or chasing off the army, killing all the infidel men aged 13 and up, per relevant fatwas, more or less. Everyone else doesn’t resist much. With a few black eyes, maybe some naked women’s bodies left to rot on a rooftop, they have a bunch of people the fatwas say they can use in various ways. They’re not allowed to gas women and children – and take open credit for it. Otherwise, whatever.

        And, perhaps too obvious but can’t go unnoted: Islamist forces reportedly abducted some 250 civilians from Khatab and Majdal and took them to Khan Sheikhoun, just 5 days before this alleged sarin attack. There was early talk of those being the victims, but so far no known IDs or anything, nor even clear confirmation if it happened, if it was mostly women and children as usual, or what. Also, they might have used less obvious captives taken more quietly from within the rebel-held areas. Either way, the given names and stories are quite possibly falsified.

      • Mark says:

        I call it unique because I know of no example of a non-state actor secretly using gas chambers to kill hundreds of victims. If you know of such an example, please let me know.

  5. Pingback: The alleged Khan Sheikhoun chemical attack « Human rights investigations

  6. Paul McKeigue says:

    You raise several points (paraphrased below in italics) which I’ll try to deal with one at a time:

    Why not specify prior probabilities for each hypothesis (as Rootclaim does)?
    I’m not sure why you think it’s so important to specify a prior. If the weight of evidence favouring the managed massacre over the regime attack hypothesis for Ghouta and Khan Sheikhoun is 20 bits, this is enough to overwhelm almost any reasonable prior.
    Abducting civilians is "unique" to the managed massacre hypothesis.
    If you look through the coverage of the Syrian conflict, there are many reports of civilians (mostly from the Alawite minority) including women and children being abducted and held captive by the opposition. Some of these reports have been confirmed by subsequent release of captives in prisoner swaps, or in Ghouta by the opposition exhibiting women captives in cages as human shields.
    The two false flag hypotheses – opposition chemical attack (H2) and managed massacre (H4) are "formulated to cover the evidence we see".
    This is an important point: a hypothesis that has adapted to fit the data is less probable than a hypothesis that would have predicted the data without having to adapt. It’s not the prior, but the likelihood, that takes this into account. Calculation of the likelihood of a hypothesis imposes an automatic penalty (the "Occam factor") for complex hypotheses with floating parameters that adapt to the data. See the discussion this in the comments on Part 1, and especially the link to Mackay’s explanation.
    Looking back at the coverage of the Ghouta incident, the hypothesis of some kind of false flag was proposed as soon as the story broke, on the basis of "who benefits". However the specific hypothesis H4 was not proposed until a few days later, on the basis that images of search and rescue operations were missing. So you can reasonably argue that H4 was a data-derived hypothesis. So to take account of this, we should specify a single "false flag hypothesis" combining H2 and H4, with a floating 0-1 variable (0 = opposition chemical attack, 1 = managed massacre). This hypothesis has to pay a penalty (Occam factor) of 1 bit for adapting to the data. With a weight of evidence of more than 20 bits, this wouldn’t make much difference to the conclusion.
    It’s difficult to lay out evidence against the false flag hypothesis
    The blog post lists five possible sources of observations that would be expected to yield evidence favouring the regime attack hypothesis over the managed massacre hypothesis if the regime attack hypothesis were correct. Specifically in relation to blood tests, I have noted that if the measured concentrations of sarin were reported, and if it could be established that the levels of exposure measured in the blood tests were consistent with the clinical histories, this would provide evidence favouring regime attack over managed massacre. Unfortunately the OPCW labs have not reported any quantitative results of the blood tests on purported victims in Ghouta and Khan Sheikhoun.

    • Mark says:

      1. It’s as simple as “extraordinary claims need extraordinary evidence”. The secret gas chambers and planted sarin evidence is a very extraordinary claim. I’d say one in a million is a reasonable estimate of the prior probability. It has never happened before and requires an enormous effort to pull off and keep secret. One of the points of using Bayes is to allow people to quantify their differences and argue for their view without talking past one another. If you’re not willing to discuss what you think the priors are, then we can’t have that discussion.

      2. Abducting civilians isn’t the unique part. Secret gas chambers and planting evidence of a chemical attack are unique.

      3. I’ll have to look into this point more. But accepting your points for the sake of argument, a 1 bit penalty for the specificity of H4 and H2 seems very small. And claiming that evidence of Sarin use was planted was certainly not data-driven, but rather to cover the evidence.

      4. We know more about the KS Sarin exposure, because some victims were immediately taken to Turkey, where the OPCW had access to them. In that case, the Sarin concentration was high enough to kill three people and hospitalize many others for many days. Given that the lethal dose of Sarin is so small, I find it hard to believe that the opposition was able to correctly dose the staged victims. And are you claiming these victims volunteered? And what about the videos of victims still alive in hospitals and clinics? They can’t have been secretly gassed because any survivors would be a risk of exposing the plot. So how many people were purposefully exposed to Sarin by the opposition?

      • Adrian D. says:


        I’m not sure that the ‘secret gas chambers’ hypothesis necessarily should be that difficult to imagine given that we are still talking of ‘sarin or sarin-like’ substances.

        Although written in possible support of an alternative hypothesis (a variant of the conventional-bomb releasing toxic agent account this recent Alternet article suggests that the OPCW evidence for sarin is seriously flawed.

        These ‘gas chambers’ then could quite easily be a basement or even the back of a truck.

        That such a plan would require an enormous effort to keep secret might be true if we lived in a world where investigators were fearless, impartial, meticulous and led by the best available evidence. That the UN and OPCW are none of these is amply demonstrated here by their ignoring their own guidelines for chain of custory of samples, their refusal to forensically analyse the metadata of any of the digital media submitted, their refusal to publish the transcripts of their interviews, their insistence of using weather information gleaned from distant monitoring stations rather than the copious amount of video evidence of smoke clouds, their refusal to examine the radar tracks of the aircraft alleged to have carried out the attack and their ignoring of clearly visible traumatic injuries in at least half a dozen of the victims.

        As for numerous victims increasing the chance of revealing the plot, again this would be a factor were that plot being seriously investigated by impartial bodies. That so many of the ‘victims’ accounts already contradict each other should be grounds enough to doubt their testimonies (some managing to survive for long periods in the gas without harm, others arriving after 2 hours to be rendered immediately ill etc.)

      • Paul McKeigue says:


        Unfortunately the article by Gareth Porter that you link to contains serious scientific errors. Porter appears to think that positive results in lab tests for sarin exposure could be caused by phosphine. My comment on this earlier post explains the phrase “sarin or a sarin-like substance” and also lists the evidence suggesting that the Syrian opposition was producing “home-made” sarin in 2013.

  7. Paul McKeigue says:


    I think the event to which you are assigning a probability of one in a million is
    that the opposition will stage a chemical attack by massacring
    captives in gas chambers. Do I understand you correctly? I think
    most people would assign a very low unconditional probability to this event.

    However the probability we need for this exercise is the conditional probability
    of a managed massacre, given that the opposition in Ghouta is in possession of
    large numbers of bodies and is alleging a chemical attack by the

    To take a famous example, in the OJ Simpson murder trial, the
    prosecution had produced evidence that he had a history of violence
    towards his wife. In response, the defence lawyer cited a statistic that only 1 in
    1000 men who are violent towards their wives go on to murder them,
    implying that this was the probability that Simpson was guilty. However, the conditional
    probability that the husband was the perpetrator, given that the wife
    was murdered and that the husband had a history of violence towards
    her, is greater than 0.9.

    So the prior probabilities you need to specify are the conditional
    probabilities of each of the four hypotheses – H1 regime chemical
    attack, H2 opposition chemical attack, H3 unauthorized chemical attack
    by rogue military unit, H4 managed massacre – given that an alleged
    chemical attack has occurred. All these events are rather improbable
    in that we would not have expected them to occur, but their
    conditional probabilities – assuming that these four
    hypotheses cover all possible explanations of the alleged chemical
    attack in Ghouta – have to add to 1.

    As discussed in the article, to predict how, under the hypothesis of a
    managed massacre as the explanation for KS, the planners would
    allocate victims, you have to put yourself in the shoes of a clever
    and ruthless Al-Qaeda commander. We would expect to see jihadis
    posing as civilian victims. Some of these would be exposed to low
    levels of sarin so that they could be presented for testing by OPCW.
    It would not be difficult to carry out a dose-finding study to find a
    dose that could safely be given to volunteers under medical
    supervision (Such studies were done on thousands of volunteers at
    Porton Down from 1945 onwards, with only one fatality). Other jihadis
    might simply act the part of victims in the videos, mixed in with real
    victims. A few of the real victims would be selected for autopsy in
    Turkey and these would be exposed to sarin at high doses. They would have to
    be dead on arrival.

    Under the managed massacre hypothesis, it would be difficult for the
    planners to generate survivors whose blood tests showed previous exposure to
    sarin at levels high enough to cause severe or life-threatening
    symptoms. Such test results on survivors would favour the hypothesis
    of a real chemical attack. Unfortunately, as noted above, OPCW has
    not released quantitative blood test results for sarin from either
    Ghouta or KS.

  8. Pingback: Khan Sheikhoun Chemical Attack: Guest Blog Featuring Paul McKeigue’s Reassessment | Tim Hayward

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s