Comment on this story
comment
If ChatGPT, Open AI’s feisty new chatbot, wrote this story, it would say:
“As companies seek to streamline their operations and increase productivity, many are turning to artificial intelligence tools like ChatGPT to help their employees get tasks done. But can workers really count on these AI programs to take on increasing responsibilities, or will they eventually fall short of expectations?”
Not great, but not bad, right?
Employees are experimenting with ChatGPT for tasks like writing emails, producing code, or even completing a year-end exam. The bot uses data from the web, books, and Wikipedia to generate conversational responses. But the technology is not perfect. Our testing revealed that sometimes there are answers that may contain plagiarism, contradict themselves, are factually incorrect, or contain grammatical errors, to name a few – all of which could be problematic at work.
ChatGPT is basically a text recognition system, similar to but better than those built into text messaging apps on your phone, says Jacob Andreas, an assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory, who studies natural language processing. While this often leads to great-sounding answers, the content can have some issues, he said.
“If you look at some of these really long essays generated by ChatGPT, it’s very easy to spot where it contradicts itself,” he said. “If you ask it to generate code, it’s mostly correct, but often there are errors.”
We wanted to know how well ChatGPT can handle everyday office tasks. Here’s what we found after testing across five categories.
We have asked ChatGPT to respond to various types of incoming messages.
In most cases, the AI gave relatively appropriate answers, although most were verbose. For example, when I responded to a coworker’s question in Slack about how my day was going, it repeated: “@[Colleague], Thanks for the question! My day is going well, thanks for asking.”
The bot would often leave parenthetical phrases if it wasn’t sure what or who it was referring to. It also assumed details not included in the prompt, resulting in some factually incorrect statements about my job.
In one instance, it said it couldn’t complete the task and said it was “unable to receive and reply to emails”. But when prompted by a more general request, it generated a response.
Surprisingly, ChatGPT was able to generate sarcasm when prompted to respond to a colleague’s question about Big Tech doing a good job.
People use generative AI to develop new ideas, among other things. However, experts warn that people should be careful when using ChatGPT for this at work.
“We don’t understand to what extent this is just plagiarism,” said Andreas.
The possibility of plagiarism was clear when we asked ChatGPT to come up with story ideas on my beat. One pitch in particular was for a story idea and angle that I had already covered. Although it’s unclear whether the chatbot drew on my previous stories, liked others, or just generated an idea based on other data on the internet, the fact remained: the idea wasn’t new.
“It sounds good to sound human, but the actual content and ideas are usually familiar,” said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management, who studies the impact of artificial intelligence on work. “These are not new findings.”
Another idea was out of date and explored a story that would be factually incorrect today. ChatGPT says it has “limited knowledge” of anything after 2021.
Providing more details in the prompt led to more focused ideas. However, when I asked ChatGPT to write some “quirky” or “funny” headlines, the results were shocking and at times nonsensical.
Master difficult conversations
Have you ever had a colleague speak too loudly while trying to work? Maybe your boss is hosting too many meetings, affecting your focus time?
We tested ChatGPT to see if it could help handle tough workplace situations like this. For the most part, ChatGPT produced appropriate responses that could serve as good starting points for workers. However, they were often wordy, formulaic, and in one case completely contradictory.
“These models don’t understand anything,” Rahman said. “The underlying technology looks at statistical correlations… So you get formulaic answers.”
A resignation letter he created could easily stand, and in some cases perform better, than the communications companies have been sending out in recent years. Unprompted, the bot cited the “current economic climate and the impact of the pandemic” as the reason for the layoffs, saying the company understands “how difficult this news can be for everyone.” It suggested redundant workers would have support and resources and, as prompted, motivated the team by saying they “would come out stronger”.
During difficult conversations with colleagues, the bot greeted them, addressed the issue gently, and softened the delivery by saying, “I understand” the person’s intent, and ended the note with a request for feedback or further discussion.
But in one instance, when he was asked to tell a colleague to turn down the volume on phone calls, the request was completely misunderstood.
We also tested if ChatGPT can generate team updates when we feed it important points that need to be communicated.
Our first tests again provided appropriate answers, albeit formulaic and somewhat monotonous. However, when we set an “excited” tone, the wording loosened up and included exclamation points. But each memo sounded very similar even after switching the request.
“It’s both the structure of the sentence and the connection of the ideas,” Rahman said. “It’s very logical and formulaic … it’s kind of like a high school essay.”
As before, it made assumptions when it lacked the necessary information. It became problematic when it didn’t know what pronouns to use for my colleague — a mistake that could signal colleagues that I either didn’t write the memo or didn’t know my team members very well.
Writing self-assessment reports at the end of the year can create fear and apprehension in some, resulting in an assessment that sells poorly.
Feeding ChatGPT-Clear achievements, including key data points, resulted in an enthusiastic self-evaluation. The first attempt was problematic as the first prompt asked for a self-evaluation for “Danielle Abril” and not “I”. This led to a third-person review that sounded like it came from Elmo from Sesame Street.
Changing the prompt to ask for a rating for “me” and “my” accomplishments resulted in complimenting phrases such as “I have consistently demonstrated a strong ability”, “I am always willing to go the extra mile”, “I have been an asset to the team” and “I am proud of my contribution.” It also included a look to the future: “I am confident that I will continue to make valuable contributions.”
Some of the highlights were a bit generic, but overall it was a glowing review that could serve as a good rubric. The bot achieved similar results when asked to write cover letters. However, ChatGPT had a major flaw: it mistakenly assumed my job title.
So, was ChatGPT helpful for general work tasks?
It helped, but sometimes its errors caused more work than doing the task manually.
ChatGPT served as a good starting point in most cases, providing helpful words and initial ideas. But there were also answers with errors, factually incorrect information, verbiage, plagiarism and misunderstandings.
“I can see it being useful… but only insofar as the user is willing to inspect the output,” Andreas said. “It’s not enough to throw it off course and start emailing its colleagues.”