Article Text

Original article
Conducting a fully mobile and randomised clinical trial for depression: access, engagement and expense
  1. Joaquin A Anguera1,
  2. Joshua T Jordan1,
  3. Diego Castaneda1,
  4. Adam Gazzaley1,
  5. Patricia A Areán2
  1. 1Departments of Neurology and Psychiatry, University of California, San Francisco, California, USA
  2. 2Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
  1. Correspondence to Dr Patricia A Areán, Department of Psychiatry and Behavioral Sciences, University of Washington, 1959 Northeast Pacific Street, Seattle, WA 98195, USA; parean{at}


Importance Advances in mobile technology have resulted in federal and industry-level initiatives to facilitate large-scale clinical research using smart devices. Although the benefits of technology to expand data collection are obvious, assumptions about the reach of mobile research methods (access), participant willingness to engage in mobile research protocols (engagement), and the cost of this research (cost) remain untested.

Objective To assess the feasibility of a fully mobile randomised controlled trial using assessments and treatments delivered entirely through mobile devices to depressed individuals.

Design Using a web-based research portal, adult participants with depression who also owned a smart device were screened, consented and randomised to 1 of 3 mental health apps for treatment. Assessments of self-reported mood and cognitive function were conducted at baseline, 4, 8 and 12 weeks. Physical and social activity was monitored daily using passively collected phone use data. All treatment and assessment tools were housed on each participant's smart phone or tablet.

Interventions A cognitive training application, an application based on problem-solving therapy, and a mobile-sensing application promoting daily activities.

Results Access: We screened 2923 people and enrolled 1098 participants in 5 months. The sample characteristics were comparable to the 2013 US census data. Recruitment via yielded the largest sample. Engagement: Study engagement was high during the first 2 weeks of treatment, falling to 44% adherence by the 4th week. Cost: The total amount spent on for this project, including staff costs and β testing, was $314 264 over 2 years.

Conclusions and relevance These findings suggest that mobile randomised control trials can recruit large numbers of participants in a short period of time and with minimal cost, but study engagement remains challenging.

Trial registration number NCT00540865.

  • Accessible
  • mHealth
  • Psychiatry

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Five hundred million individuals use mental health apps worldwide, with these numbers expected to reach 1.7 billion by 2018.1 The potential for mobile devices to revolutionise healthcare and clinical research has not been lost on either industry2 ,3 or academia.4–6 Notable examples of initiatives to collect behavioural data using mobile technology are the Patient-Centered Outcomes Research Institute (PCORI)-funded Patient-Centered Clinical Outcomes Research Networks, National Institutes of Health (NIH's) Precision Medicine Initiative, and Apple's Research Kit. Indeed, the use of smart technology appears to be a clear avenue to increase research participation.7 ,8

Mobile technologies may be particularly useful in improving participant access to and reducing expenses of randomised clinical trials (RCTs). Typical RCTs cost millions of dollars and recruit 200–300 participants in 3–5 years.9 Sample demographics are determined by the location of the research institution, limiting the representativeness of many RCT samples. Expense and access problems are exacerbated when trying to study populations who are challenging to recruit, such as those with mental illnesses, people living in rural areas or racial/ethnic minority populations.

One solution to overcome access and cost issues has been the use of the internet to conduct randomised control studies. These trials are beneficial from the cost perspective, with estimated cost reductions of more than 50% compared with conventional trials,10 ,11 and from the access perspective, these studies recruit very large samples in short periods of time.12 ,13 However, retention issues are particularly problematic for internet studies, with one recent internet-based trial reporting a drop-out rate of over 90% in a sample of 3000 individuals.14 Drop-out is likely due in part to the need to access a WiFi connection and dependence on an immobile device (eg, a desktop computer). A potential advantage to research using mobile devices (smart phones and tablets) is that data can be collected anywhere at any time. These devices also facilitate passive data collection such as Global Positioning System (GPS) information from the phone's accelerometer and media usage to gauge social and physical activity that can supplement self-reports. However, while mobile technology may be able to further expand the reach of clinical research, this approach has yet to be tested.

The purpose of this study is to determine the feasibility of conducting a fully remote RCT using smart devices in depressed adults 18 years old and older. We elected to study depression as our clinical focus given its ubiquitous presence in mental illnesses and disability.15 It is the leading cause of disability worldwide,16 ,17 and the enrolment of depressed individuals into clinical trials is difficult.18 ,19 In this paper, we report data on population access (sample representativeness), engagement assessment and cost to complete the study.


Ethical approval for the trial was granted by the UCSF Committee for Human Research.


To test our hypotheses about access, we used three different types of recruitment approaches: traditional, social networking and search engine-based methods. Traditional methods were written ads placed in city buses, newspapers and Craigslist postings throughout the USA. Social networking methods included regular postings on sites such as Facebook and Twitter, and contextual-targeting methods to identify and directly push recruitment ads to potential participants, based on their Twitter and other social media comments. Our search engine-based method included using Google Adwords, a historically successful recruitment tool.20 Each approach (described further in see online supplementary materials) provided potentially interested participants a link to our custom study website (

Participant eligibility

Participants had to speak English, be 18 years old or older, own a smartphone (iPhone or Android) with WiFi or 3G/4G capabilities, and own an iPad2 or newer device. iPad ownership was required as our cognitive assessment tool was only available on this device at the time of the study. To characterise recruiting logistics without this restriction, individuals without an iPad but with a smartphone were given the opportunity to participate in phone-only study arms that were not part of the randomised sample. A Patient Health Questionnaire (PHQ-9),21 score of 5 or greater, or a score of 2 or greater on PHQ item 10 (indicating that they felt disabled in their life because of their mood), was also required for enrolment.



Potential participants were directed to a website ( explaining the study purpose and procedures. Interested participants completed an online brief screening consisting of questions about mobile device ownership.


We used a combination of a written consent and custom videos posted on YouTube to explain the study. Participants had to pass a quiz that tested their understanding that the study was voluntary, was not a substitute for treatment and that they were to be randomised. Each question had to be answered correctly before moving on to baseline assessment and randomisation. Eligibility was established after consent was obtained.


Participants were randomised to one of three treatment arms where they viewed a brief video explaining how to download and use the assessment and assigned treatment app. Participants were also given a link to view a custom dashboard of their study progress.


Participants were asked to use their assigned app for 1 month. The first app was a cognitive intervention video game (Project: EVO™, or EVO) designed to modulate cognitive control abilities, a common neurological deficit underlying depression.22 The second intervention was an app based on an evidence-based treatment for depression (problem-solving therapy, or PST).23 The final intervention app, an information control, provided daily health tips (HT) for overcoming depressed mood such as self-care (eg, taking a shower) or physical activity (eg, taking a walk; see online supplementary materials for further descriptions of each).


We used two apps to collect baseline and 4, 8 and 12 weeks of outcome data. The first app, developed by™ was used to collect self-reported mood, function and passive analytics such as communication data (text logs including call/text time, call duration, text length and screen usage), and mobility data (activity type and distance travelled using the phone's accelerometer and GPS). The second app was a mobile cognitive assessment app (Adaptive Cognitive Evaluation (ACE)), to measure cognitive control processes (see online supplementary etable 1). Participants were automatically notified every 8 h for 24 h if they had not completed a survey within 8 h of its original delivery. An assessment was considered missing if it was not completed within this 24 h time frame.

The baseline assessment included the collection of demographics including age, race/ethnicity, martial and employment status, income, education, smart device ownership, use of other health apps, and use of mental health services, including use of medications and psychotherapy. We collected information on mental health status using the PHQ-924 for depression, the generalized anxiety disorder (GAD)-7,22 for generalised anxiety, a four-item mania and psychosis-screening instrument25 and the four-item National Institute on Alcohol Abuse and Alcoholism (NIAAA) Alcohol Screening Test.26 To assess for self-reported disability, we used the Sheehan Disability Scale.27 ,28 We also asked participants to rate their health on a scale of excellent to poor.

Daily assessments were a combination of self-report and passive data collection. Participants completed the PHQ-2 (mood and enjoyment) every morning. The app collected passive analytics daily. Private information such as actual content of voice calls or text messages or emails was not collected.

The 4-week, 8-week and 12-week assessments included the PHQ-9 to measure changes in mood, ACE for changes in cognitive control, and the Sheehan for changes in disability. Participants were also asked this question: ‘since using this app, I feel that I am: (1) much worse, (2) worse, (3) no different, (4) improved, (5) much improved’.


Randomised participants were paid a total of $75 for completing all assessments over the 12 weeks via Amazon gift vouchers, while participants in the phone-only arms were paid $45 as they did not complete the cognitive assessment. To test if increased payment led to increased adherence and retention, a subset of participants (n=144) were given $75 in bonus pay if they completed all assessments.

Procedures to reduce gaming

‘Gaming’ is a situation where a user fraudulently enrols in a study solely to acquire research payment. We utilised the following safeguards to prevent this: (1) locking the eligibility survey if a participant tried to change a submitted answer, (2) using study links that are valid for one user/device, and (3) tracking IP addresses to minimise duplicate enrolment.

Statistical analyses

To assess participant access, we describe the sample demographics, clinical characteristics and sample comorbidities using the appropriate descriptive statistics. To assess participant engagement, we examined the proportion of study drop-outs and the proportion of enrolled individuals who responded to the primary mood outcome measures at each time point using a mixed-model analysis of variance (with Greenhouse-Geisser corrections when needed). To calculate time to drop-out, we tested a survival analysis model with the distribution of the ‘survival’ times for those assigned each app estimated and non-parametric estimates of the survivor function computed by the Kaplan-Meier method, with curves tested using the log-rank test using Stata V.14.0. We also examined whether there was a significant difference in drop-out rates among the three interventions using Pearson's χ2 test. Pairwise log-rank tests were conducted to determine where there were significant differences between the distributions, and a Bonferroni correction was set at p<0.017 to correct for multiple comparisons. We also compared these outcomes for the entire sample and by sample type (randomised and non-randomised). To assess issues surrounding cost, we describe a total study cost approach factoring in β testing, staff time and efforts beyond those payments made for recruitment and participant remuneration.



Recruitment rate

National recruitment began in August of 2014, and was conducted in five, 2-week advertising waves (total of 5 months of recruitment). We recruited a total of 2923 participants. Of these recruited individuals, 1098 were enrolled to the randomised (N=626) and non-randomised (N=472) arms of the study (see figure 1). Eighty-nine per cent of the sample came from traditional recruitment approaches, <1% came from social networking, <1% came from search engine-based methods, and 10.3% came from unanticipated means (own search, referrals). We were able to successfully recruit individuals from 8 of the 15 most rural states in the USA29 without any targeted recruitment efforts (see figure 2A).

Figure 1

CONSORT diagram (HT, health tips; PHQ, Patient Health Questionnaire).

Figure 2

Demographic characteristics. (A) Percentage of recruited participants across the USA. (B) Percentage of participants within different age ranges from the recruited sample. (C) Ethnic composition of the recruited individuals, and its comparison to the observed ethnic composition reported in the 2013 US Census.

Sample demographics

Participants were primarily young adults (see figure 2B), although age ranged from 18 to 76, with 79% identifying as female. Fifty-eight per cent of our sample was non-Hispanic white, and an ethnicity distribution comparable to the 2013 US Census (see figure 2C). Fifty-seven per cent of our participants obtained a 4-year college degree or higher, with a mean annual income of $30–$35 000 (see table 1). Sixty-seven per cent of our participants were employed at the time of enrolment. There was a difference in age between randomised and phone-only assigned participants, with randomised participants slightly older than phone-only participants, t(954.38)=−3.22, p=0.001. However, there was no difference in gender between these groups (Embedded Image=0.08, p=0.77). Enrolled individuals who were single/never married reported greater symptoms of depression (t[528.17]=2.96, p=0.003).

Table 1

Baseline demographics

Clinical characteristics

The sample was moderately depressed at baseline, with a PHQ-9 mean score of 13.9 (SD=5.1). There was a significant association between age and depression severity, Spearman's r=−0.11, p<0.001. There was no significant difference in depression severity among gender (t(365.85)=0.63, p=0.53) or ethnic groups, (F(6, 1091)=1.37, p=0.22). Fifty-one per cent reported comorbid anxiety, 53% reported comorbid alcohol misuse, 16% reported a history of psychosis or mania. In total, 54.5% of our sample was receiving mental health treatment for their depression. This sample mirrored the ethnic disparities in mental health service use found in the general population, with 63% of non-Hispanic white participants in treatment, and only 42% of ethnic minorities were in treatment (Embedded Image=28.6, p<0.001, OR=2.29). There were no significant differences in depression severity among individuals randomised to the three primary arms (F[2, 623]=0.14, p=0.87; see table 1).


Sixty-six per cent of the sample completed the 4-week assessment, 50% completed the 8-week assessment and 41% completed the 12-week assessment (see figure 3A). There was no adherence difference by group (F(2, 241)=2.50, p=0.08) and no time by group interaction (F(3.55, 428.14)=1.93, p=0.11). We found similar adherence to the cognitive assessment tool, with neither a group (F=0.46, p=0.63) nor interaction effect present (F=0.91, p=0.42). Although lower assessment adherence was observed in the more depressed participants, younger participants, and participants with lower education, the effects sizes were small (see table 2).

Table 2

Baseline demographics*

Figure 3

Intervention and assessment adherence. (A) Percentage of individuals who responded to their mood assessment during the treatment phase (first 4 weeks) and follow-up periods (weeks 8 and 12). (B) Kaplan-Meier survival estimates per study arm illustrating survival distributions of time to drop-out (last day of recorded activity) over the course of the study (84 days).

Kaplan-Meier survival analysis was conducted to determine whether intervention assignment or any baseline demographic variables predicted drop-out status. The log-rank test revealed a significant difference between the survival distributions between groups (Embedded Image=19.27, p<0.001), with the EVO arm having significantly earlier time to drop-out than the PST arm (Embedded Image=7.45, p=0.01) or HT (Embedded Image=17.51, p<0.001) arms (see figure 3B). We did not find a significant difference in survival distributions for those with high versus low PHQ-9 scores (using a PHQ-9 score of 10 as a cut-point, Embedded Image=2.29, p=0.13). There was no significant difference in survival distributions between non-Hispanic whites and ethnic minorities (Embedded Image=2.13, p=0.14). Participants who received bonus pay remained in the study longer than those who did not receive a bonus (Embedded Image=11.82, p<0.001). Bonus pay was for assessment completion, not intervention app use.


Total study costs included participant payments ($23 320), website/enrolment portal/database development ($46 507), and salaried staff time (3; 2 student volunteers also assisted) over the 9 months the study was active ($58 917), summing to a total of $128 444. The total amount spent on for this project, including staff costs, development and β testing of the UCSF developed apps (ACE and iPST), and licensing fees for the use of the other apps (EVO and was $314 264 over 2 years.


The results from this study have a number of important implications for the future of RCTs in mental health. First, we recruited a large sample of depressed participants in a short period of time and with minimal cost and effort. Currently, the typical RCT takes 4–5 years to complete, and another 1–2 years before the outcomes from these trials are reported publically. Rapid recruitment has the potential for quickly testing intervention efficacy and effectiveness, and ultimately moving effective treatments into practice while identifying and preventing the proliferation of ineffective, even unsafe, treatments. Second, we were able to recruit a highly representative sample of the US population, without any specific cultural adaptations or targeted advertising. Remote research methods could address decades-long concerns about the generalisability of clinical findings to minority samples not typically represented in clinical research. Finally, the cost of a fully remote RCT could allow for greater distribution of dwindling clinical research and development funds from federal, foundation and industry sponsors. Investment in large-scale clinical trials is a costly endeavour, resulting in the need to focus funds on only a few research areas. Although not all mental health RCTs should be fully remote, particularly those that test hypotheses about biological or neurological processes that can only be measured with immobile devices, the methods presented here, such as automated data collection of neuropsychological processes could result in substantial savings, which in turn could be invested in a diverse research portfolio.

This research method is not without its limitations. Primary among the challenges of fully remote research is the ability to keep participants engaged in the study protocol over time. Although there is an appeal to quickly recruiting and retaining large numbers of participants in an RCT, researchers and developers need to be cautious when interpreting outcomes from samples with a drop-out rate greater than 70%.30–33 However, it is important to point out here that the project was completely automated with very little contact between the participant and research team, and our retention rates were higher than in the typical internet-based RCT.34 Internet-based studies have shown that when there is more direct contact between the research team and participant through technologies such as video-over-internet protocols, retention rates are greater and less subject to bias,35 suggesting a hybrid approach may provide an optimal response.

Although we experimented with two incentive models to encourage retention, we determined that participant payment was not enough to keep engagement from waning across the course of the study, as bonus pay only encouraged participants to complete their assessments, and did not engender any additional motivation to utilise the training apps. Previous work has demonstrated that externalised benefits in the form of compensation can dull motivation,36 ,37 indicating that the creation of internalised reward structure to enhance motivation (eg, individualised presentation of study progress, personalised encouragements) is critical for improved adherence.


Mobile technology has an important role in broadening the reach and representativeness of RCTs, while substantially reducing the time to determine intervention effectiveness and reducing study costs. Although study retention remains challenging for technology-based research, innovative methods to increase motivation and study engagement could easily address this important limitation.


The authors thank A Brandes-Aitken (UCSF), M Brander (UCSF) and A Bodepudi (UCSF) for assistance in data monitoring, M Gross (UCSF) and J Camire ( for their help in participant recruitment, J Steinmetz ( for database architecture, D Ziegler (UCSF) for helping with app deployment, D Albert (UCSF) for assistance in web design, L Slavik (UCSF) for accounting assistance, C Catledge (UCSF) for administrative oversight, A Piper, E Martucci, S Kellogg, J Bower, M Omerick and the entire Akili Interactive team as well as I Elson, L Kaye, S Goobich and the rest of the team for helping with data collection and partnering with us on this project. They also thank all the participants whose time and efforts made this work possible. Support for this research was provided by the National Institute of Mental Health (P.A.A.; R34-MH100466, T32 MH0182607, K24 MH074717). JAA and JTJ had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Competing interests AG is co-founder, chief science advisor and shareholder of Akili Interactive Labs, a company that develops cognitive training software. AG has a patent pending for a game-based cognitive training intervention, ‘Enhancing cognition in the presence of distraction and/or interruption’, on which the cognitive training application (PROJECT: EVO) that was used in this study was based.

  • Ethics approval UCSF Committee for Human Research.

  • Provenance and peer review Not commissioned; externally peer reviewed.