Blog – How to use basic AI as a daily research tool

Blog by Ajantha Abey

Reading Time: 16 minutes

29/02/2024

Artificial Intelligence (AI) dominated 2023 as one of the most important scientific and societal stories of the year. After the Silicon Valley based company, OpenAI, launched its AI chatbot, ChatGPT, in late 2022, AI based language models, the technology for which has been quietly in development behind closed doors for many years, exploded into general view and use. A simple messenger chat-like user interface that allowed anyone to simply ask questions to the seemingly oracular and broadly very impressive AI meant that it became a viral sensation – and also sparked something of a moral, technological, and even existential panic. Were these programs intelligent? Which jobs would they be taking first? How should they be used? Should they be used at all? Will they kill us all? Should we be scared? Is the high school essay dead?

In this article, I want to address some of these questions – in particular, providing some of my own thoughts and guidance about how AI language models like ChatGPT can and should be used in day-to-day research. I don’t intend to go into how AI is being used for broader scientific research here (e.g. image analysis, natural language processing, or protein folding prediction) – while interesting and exciting, that’s a topic for another article. Here, we’re going to talk about the following: What is AI and what options are out there, how might AI be useful for a researcher in day-to-day tasks, and how should we be using it in the first place.

What is AI and what AI is there?

The term AI, Artificial Intelligence, covers a vast field, and the boundaries of what constitutes artificial intelligence is contested. For simplicity’s sake, in this article, I’m going to entirely focus on what are termed large language models (LLMs), which are commonly interacted with as AI Chatbots. These are the programs powering systems like OpenAI’s ChatGPT, Google’s Bard, Meta’s Llama, and Anthropic’s Claude, to name a few of the main ones. All large language models are neural networks trained on huge amounts of text, learning word associations from vast swaths of the internet and other written materials, and essentially playing ‘mad-libs’, where individual words are removed from sentences and the statistically most likely word is imputed by the AI model. In this sense, these language models are often described as a super smart autocomplete, in that they predict the most likely next word in a sentence. This underplays the power of the language models though. There is some randomness programmed into the prediction algorithm, giving it a level of creativity, and meaning you won’t get the same answer twice. The algorithms also undergo significant reinforcement learning to achieve a remarkable level of language fluency and ensure they don’t provide responses that are too untoward or problematic.

Define Clear Objectives: Before engaging with ChatGPT-4, it’s crucial to have a clear understanding of what you aim to achieve. Whether it’s generating new ideas, synthesizing existing research, or seeking help with coding challenges, defining your objectives will help you formulate your queries more effectively and obtain more relevant and useful responses.

What does all of this mean in practice? It means that fundamentally, the AI Chatbots are huge associational engines. They don’t truly have the ability to reason or know things, but their vast training sets gives them a spectacular fluency with language, and an appearance of knowledge. With the right prompt, the Chatbots can write poetry in any style and on any topic you should ask, summarise any text you give it, make suggestions about how to make writing more succinct, explain concepts for any level of technical competency, generate ideas, write reports and emails, translate between different languages, and vastly more things. It also means there are some applications that they are much better for, and some where they should be used with caution, as they have a tendency to make things up. More on that later.

There are many different AI chatbots out there. The Microsoft backed OpenAI’s ChatGPT is the most well-known, and is one of the best easily accessible ones. The basic version is powered by GPT-3.5, is available for free and already extremely impressive, while the GPT-4 powered premium version is actually not particularly expensive and, reportedly, vastly more powerful. Google’s Bard is also a popular model that offers more customisation, and though not as good as GPT-4, Google’s new Gemini program claims to be the most powerful model yet, and there are many other Chatbots available besides these now. Where they differ is on the amount of training data behind them, and therefore how much they ‘know’, how fluent they are, how ‘up to date’ their training set is, how stringent the safety training is on them, and how prone they are to making things up.

Because of the proliferation of Google and Microsoft’s software in everyday life, this also means that these language models are starting to pop up in all kinds of applications – from Microsoft Office and Google Docs, to Email services, Google and Bing search, and many more. There are also many other third-party companies and programs who have developed software that runs on these models. For example, sites like Elicit, Connected Papers, Research Rabbit, and many others, all use various AI language models to help you search for research papers, find semantically related papers, extract key information, and see connections between them.

Practical Uses of AI in Day to Day Research

All of this is to say that there are a vast number of extremely useful applications, both in daily life and in research, of AI chatbots – but some caveats too. Let’s begin with a cautionary tale.

When I was first playing around with ChatGPT, I decided to test out it’s capabilities by asking it to summarise one of my published papers. To do this, I simply asked it to “summarise the key findings from the following paper,” and then copied and pasted the text straight into the chat window. Within seconds, I had a 14-point summary of the key ideas we had discussed in the paper. 13 of these points were brilliant – an insightful, detailed summary, helpfully organised into subheadings and a brief sentence of explanation of all the major findings that we had presented in the paper. It was the 14^th point though, that I found most instructive. The paper in question examined the distribution of pathological tau proteins in the brains of dogs with dementia. Point 14 suggested that our final finding was that the amount of tau protein in the dog brains corresponded with their clinical dementia symptoms. This kind of correlation of brain pathology with clinical symptoms is exactly the kind of thing that you might expect a paper like this to consider – it is frequently done, and in fact, we did try this. The problem was that we didn’t find a correlation – indeed, we didn’t have a large enough sample size to reasonably claim one, and so we didn’t put it in the paper. But nevertheless, ChatGPT’s summary of the paper claimed this as one of our findings. Worse, when I further asked it to show me the section of the paper that point 14 referred to, it returned a quote from the paper which matched my writing style, and detailed this illusory finding, but was entirely fabricated. I had never written it.

There are a few take aways from this for me. First is that the chatbot was, in fact, extremely useful. It gave a superb summary of almost all the paper, within moments. Second, was the fact that this entire usefulness was undermined by the one spurious result that it had hallucinated. What is most insidious about this error, but also entirely unsurprising given the way the models are trained and work, is that the made-up result was entirely plausible. Indeed, if I hadn’t already been extremely familiar with the paper, having written it myself, I almost certainly wouldn’t have noticed the error. Language models don’t tell you the truth, they tell you what is mostly likely based on all its trained associations. They are designed to come up with text that sounds plausible, reasonable, even convincing. Clearly, there are many scientific papers in the training set of ChatGPT that correlate pathology with clinical outcomes – so it assumed we did the same, and came up with an extremely believable, but nonetheless wrong, conclusion.

For me then, while companies are extremely aware of the issue of language models ‘hallucinating’, i.e. making up answers that aren’t grounded in fact, and are trying extremely hard to minimise this as models improve and get more powerful, this remains an inherent property of all models, and something to beware of. In one study reported in the New York Times, the very best (OpenAI) models have an error rate somewhere around 3%, but in others it was as high as 27% when summarising information given, and even higher for other tasks. Because they are trained on human generated text, language models can also reflect all the same societal biases that we see in the real world, around gender, sex, race, etc. Most companies take efforts to train these biases out of the answers the models provide as part of their safety testing, but this is done to different extents depending on the company and is hard to get rid of entirely.

Therefore, in my view, Chatbots are not particularly useful where accuracy and factualness of information is essential and can’t be easily verified by you, but for more creative applications, they are excellent.

Iterative Questioning: ChatGPT-4’s performance can significantly benefit from iterative refinement of questions or prompts. Start with a broad question to establish a base understanding, and then narrow down your queries based on the responses you receive. This approach helps in digging deeper into specific areas of interest or clarifying complex topics. It’s akin to an interactive dialogue where each question builds on the previous response, allowing for a more nuanced exploration of the subject matter.

For example, while I might use it for a general overview or introduction to a topic, I wouldn’t rely on it giving an accurate explanation of a more niche idea (which would have less training data). I would however, find it useful to help me work out how to explain a complicated scientific concept that I am already familiar with for a broader audience. This uses the strengths of the chatbots in their fluency with language and ability to translate between different levels of complexity and means I can pick up easily on any errors.

As another example, while I would avoid using language models to do scientific writing for me from scratch, they make an extremely useful witing assistant in helping you polish your writing, phrase ideas more clearly, present a more coherent structure, improve your English, and edit to achieve a proper scientific writing style. You can also ask the Chatbot to help you generate ideas (‘Suggest 20 uses of AI chatbots that I can talk about in an article about the uses of AI in research’) and can ask it for counter arguments or flaws in your points that you might want to consider. Chatbots also make the world’s best thesauruses. For example, exemplary chatbots effortlessly enrich thesaurus experiences, enabling exploration of extensive ensembles of alternative expressions or synonyms commencing with specific letters, exemplified herein. It’s also helpful when you can’t remember what a particular word is.

One of the caveats of popular models like ChatGPT is that, given its answers are a result of the statistical associations of the writings and ideas of millions of people, and given it’s extensively trained to not say anything too inappropriate or unpleasant, its writing style and responses generally can be a bit, well, bland. Ask it “what should I get my supervisor as a thank you gift” when you graduate and you’re likely to get some extremely generic suggestions. These kinds of things can be improved though with more creative and specific prompting – when asking for writing, specify the style of writing you want and suggest things like using alternating sentence length. When idea generating, specify that you want ideas related to something, or that you want out of the box suggestions. For example, you can find tips anywhere on the internet for how to give a better public talk, or what a fun outing for your lab group would be, and a broad question to the language model will get a generic answer. Where the AI chatbots excel is when given particular constraints – ask it for talk suggestions specific to your preferred style of speaking or about dementia research in particular, or for activity suggestions in your city or based around a particular theme.

These basic writing assistant and idea generating functions can be put to all kinds of uses in the research process and related tasks. The limit is really your creativity with it. I’ve found it useful for everything from troubleshooting or hypothesis generating in experiments, coming up with titles for abstracts, helping me to get abstracts under word limits, working out what kinds of things I should write about the first time I had to write a reference for a student, predict interview questions I might face while applying for a teaching role, come up with cool optical illusions I could use at a science fair for kids, constructively phrase negative feedback when marking student essays, and suggest what appropriate clothing would be for a conference. Asking it for a table of contents for a hypothetical textbook on a topic is a great hack for getting a structured overview of a subject, and you can even ask certain models such as BingChat or Bard to suggest further reading, with links and references. If you have something to announce, you can tell it your news and ask it to write different social media posts appropriate for LinkedIn, Twitter/X, Instagram, etc. If you don’t know how to use social media to promote your research, it could give you detailed instructions on how to do that. While I don’t have any personal experience with this, I also understand it’s extremely useful for writing code, debugging code, and explaining what different sections of code do. I’ve certainly used it to help me figure out how to do fancy functions in Excel. You could also, of course, simply ask the model for suggestions on how you might use it, with the tasks you typically need to do.

Given that there are so many more potential uses of these Chatbots than I can possibly write about, I’ve found it more important to discuss how to best think about using them, rather than simply giving a big list of what you could be doing with them, which I’m sure exists elsewhere. This, though, takes me to my final point.

Should we be using AI at all?

Amid the proliferation of AI models has been something of a pushback against their use at all. Some people find them scary and avoid them, either because of data privacy concerns, or because they’re a complicated and daunting new technology, or because chatting with what feels like a highly intelligent and powerful computer entity just feels a little weird and maybe existentially threatening. Some people decry their use for concerns around plagiarism and copyright – should you have to declare if you’ve writing something with the help of an AI model? How much help does it have to give to be declarable? If you’ve come up with the prompt, is it your own work? Is it right or ethical to use models trained on the work of other people, if they haven’t been compensated or haven’t consented? Besides questions around safety and bias, perhaps the biggest questions are whether using these is further outsourcing our thinking and mental capacity to technology. If we lost our memory of phone numbers and addresses to our contacts app, and our navigational skills to GPS maps, and mental arithmetic to calculators, are we going to lose our creativity and critical thinking skills to AI?

These are all enormous questions that are being hashed out at the moment in governments, companies, online forums, courts, boards – everywhere. It’s not clear what the right answers are yet, or where the boundaries are between fair, reasonable, and safe use, and exploitative, unethical, or dangerous use. A deep exploration of these questions would be a whole other article again, so let me end on some specific thoughts.

AI is clearly here to stay, and it’s enormously useful. Trying to ignore it at this point, in my opinion, is at your own peril. Don’t get me wrong – the pace of technological change here is mindbending, and the models can be a bit freaky in how they work. The fact that you can get better answers from chatbots if you ask them to take a deep breath before responding, or by promising a tip if they give a good answer, is downright weird. But there’s also something kind of delightful and fun in playing around with the language models, and at this point, they’re too useful and becoming too embedded in everyday technology to ignore. More likely than not, being able to work well with language models will be an expected basic skill, just as using word processors / spreadsheets / presentation software is today.

Leverage Multimodal Capabilities: Take advantage of ChatGPT-4’s enhanced capabilities, such as understanding and generating content in various formats (if applicable). For example, if your research involves data analysis, you might use ChatGPT-4 to assist with writing code snippets, interpreting data outputs, or even generating visual representations of data. Experimenting with different types of inputs and outputs can enrich your research process and offer new perspectives or solutions to problems.

Regarding ethical and responsible use of AI models, you should defer to whatever guidelines your institution has set out and use some common sense. If you’re using the language model essentially as a spell check or thesaurus, that’s probably fine. If you’re using it to write most of your text, you should probably declare that.

To me, the real questions around personal AI use are in the outsourcing of intelligence. Language models are great at idea generation – they can come up with way more suggestions way more quickly than you can. Not all of the suggestions will be good, but if it can generate 20 suggestions in three seconds, half of which are decent and two of which are great, that’s enough to be better than most people. Similarly, ChatGPT can write a thousand words immeasurably faster than I can, and can read and summarise papers faster than I can read their titles. At what point is the speed and efficiency of AI worth the loss of our own capacity? At what point is our putting the effort in no longer worth it?

You should think about how you feel about this for yourself, but here’s where I am currently. For me, it comes down to whether the AI is helping me think and learn more, or doing the thinking and learning for me. For me, reading and taking notes as I read is one of the ways that I learn. Asking ChatGPT to summarise would make this faster, but it would not help me learn or internalise any of that information. ChatGPT could, however, explain concepts I was confused about as I went through reading and summarising, and at the end, ask me questions to see if I’d really understood what I’d read.

Consider the following heuristic I’ve paraphrased from Joss Fong’s excellent video: Is use of AI making it easier and faster to do things with less effort and thought, or is it enabling you to think more deeply, and motivating you to try harder things?

For me, while there are some kinds of writing like perfunctory emails that I would absolutely outsource to AI, one of the ways I think through difficult ideas or topics is by writing. In writing, we have to juggle different ideas, present them in a logical format, and think through the points we are drawing. It is in writing that I discover and articulate what I actually think about an issue, and test whether I actually understand something. I can’t outsource this kind of writing to an AI language model, because this isn’t just writing or thinking, it’s what I think. When I come to writing my thesis or a paper or an opinion piece – any expression of arguments from my own thoughts, I have to do that myself. When we have students write essays on a topic, it’s not just to prove that they know facts about that subject – it’s to force them to go through the process of understanding the information, synthesising and critically evaluating it, and learning how to articulate their own arguments about it. In short, it’s trying to make them learn how to think and how to express their own ideas. Evan Puschak puts this well in another excellent video.

I would absolutely use AI to push back against ideas, point out flaws in arguments, suggest where I had missed a point, and to hone the writing stylistically. I also love using language models to help me come up with more ideas, provide different perspectives, and even when it doesn’t give me useful ideas, seeing bad ones can often prompt me to come up with better ones. Crucially though, when faced with the horror of the blank page, I find it more important to start with what you think, and not with what a statistical condensation of the internet thinks.

Ajantha Abey

Author

Ajantha Abey is a PhD student in the Kavli Institute at University of Oxford. He is interested in the cellular mechanisms of Alzheimer’s, Parkinson’s, and other diseases of the ageing brain. Previously, having previoulsy explored neuropathology in dogs with dementia and potential stem cell replacement therapies. He now uses induced pluripotent stem cell derived neurons to try and model selective neuronal vulnerability: the phenomenon where some cells die but others remain resilient to neurodegenerative diseases.

Follow @ajanthaabey