How to identify AI Generated Content, according to chatGPT and me

what is the difference between AI written content and content written by a human? Can you explain this and explain how to identify AI written content
I asked ChatGPT, and this is what happened.

AI generated content and content written by a human can be quite different in many ways. One of the main differences is that AI generated content is created using algorithms and machine learning techniques, while human-written content is created by a person using their own knowledge, skills, and experiences.

One way to identify AI-written content is by looking for patterns and repetitions. For example, AI-generated content may use the same phrases or sentence structures multiple times, while human-written content is more likely to vary in its wording and structure. Additionally, AI-generated content may contain errors or inconsistencies that would not be present in human-written content.

Another way to identify AI-written content is by analyzing the level of complexity and abstractness. AI-generated content often lacks the level of complexity and abstract reasoning that is present in human-written content. It may also lack the nuance and subtlety that is often found in human-written content. This can make AI-generated content seem less natural or less engaging than content written by a human.

Finally, one can look at the coherence, cohesiveness and the flow of the text. AI-generated text may not have a clear structure and may jump from topic to topic without a clear transition. It may also lack the logical flow of human-written text, making it difficult to follow the argument or main point.

To conclude, AI-generated content and human-written content can be quite different in many ways, and there are several ways to identify AI-generated content. Some key markers include patterns and repetitions, complexity and abstractness, style, and coherence and cohesiveness. While AI-generated content has its own use cases, such as generating a large amount of content quickly or creating content in languages that a person may not be fluent in, it cannot replace the creativity, nuance and personal touch of human-written content.

Table of Contents

ChatGPT AI generated Content

Unless you have been living off-grid since November 2022 you will no doubt have heard of ChatGPT. The tool is amazing, new, and more advanced AI than we have seen before. And it seems everyone and their computer has been tweeting, blogging and posting about it. It is not necessarily new thinking. Many have written on how it will change the world, and more recently how it will replace writers, and even how it is the end of humanity.

Can you spot if an article was written by AI?

In case you missed it, the first 5 paragraphs of this article were written by chatGPT.

So, how do you tell the difference between AI-written content, and content written by Humans?

Here are some factors we can consider.

Perplexity

Perplexity (the randomness of the text) is a measurement of how well a language model like ChatGPT can predict a sample text. Texts with higher perplexities are more likely to be written by humans.

The chart below shows the sentence-by-sentence perplexity of a sample of this article tested n GPTZero (see below)

Abstractness

In writing and language, abstract terms are complex and often ambiguous topics used in everyday language and an abstraction is a concept or idea that’s not concrete or tangible. Abstractness is a measure of how we write about feelings, emotions, ideas, and concepts, this is what makes writing interesting to read and powerful.

Patterns & Repetitions

Deep learning and natural language processing technologies harness millions of existing patterns in databases that AI uses to ‘learn’ so while these patterns exist, or have existed in human writing, they are amplified in AI-generated content.

It may be easy to understand that we can spot repetitions in text – the same ideas repeated, or the same phrases used in an article would be easy to spot. But research shows that, as readers we are equally aware and appreciative when the opposite occurs. This academic paper, Avoiding Repetition in Generated Text (Foster & White, 2006) concluded that “In the human evaluation, participants were asked to give direct judgements on the quality of generated output presented as text: their responses indicated that they both were aware of and appreciated the variation in the output.”

Burstiness

Similar to patterns and repetitions in the text, the flow of AI writing is likely to follow a similar pattern with the perplexity of each sentence. “Burstiness” is when the perplexity of each sentence is more random. So to identify if content was written by AI we can try to measure the spikes of perplexity in each sentence. A bot will likely have a similar degree of perplexity from sentence to sentence, but a human is going to write with more spikes — maybe a long, complex sentence followed by a shorter one. Like this. And this is often a method that copywriters use to keep the interest of readers.

Testing Tools

Giant Language Model Test Room

Researchers from the MIT-IBM Watson AI lab and Harvard NLP group have created a free tool to help detect AI written content. The tool is called the Giant Language Model Test Room (or GLTR) and will give you a prediction of how likely it is that the text was AI written. It is based on GPT-2, so may not be as effective in identifying content written by ChatGPT, which is GPT-3 trained.

However, I tested it using the first 2 paragraphs of this article – written by chatGPT and it displayed a lot of green – indicating that it was likely AI generated content.

AI generated content
tested on Giant Language testing Room AI

It is showing an indication that the GPT-3 text on the left is AI generated versus my conclusion from this article on the right. However, if you see below using GLTR GPT2 test text and its test text from a NYT article. The Left GPT-2 text is using a top 10 predicted word 100% of the time.

GPTZero

Here we have the same paragraphs from this article tested in GPTZero, which successfully identifies the chatGPT text. To be clear, GPTzero was built and trained on GPT-3, so I am not saying it is a better tool, as always you need to use the right tools for the job!

Conclusion

The first 5 paragraphs of this article were written by Chat GPT in response to my question “what is the difference between AI written content and content written by a human? Can you explain this and explain how to identify AI written“.

There are some good points but I found this to be better:

To determine whether text was written by artificial intelligence, the app tests a calculation of “perplexity” – which measures the complexity of a text, and “burstiness” – which compares the variation of sentences. (Read full text here)

… it is not surprising, and that is the point.

So, Human-written content will always be different to AI? Maybe not! From a creative, emotional, and personable viewpoint probably, but when you combine the two it becomes very powerful. And I am convinced that this is the future. When human writers leverage the time-saving, idea-generating, power of AI technology we will see a dramatic change in the industry. As always some will get left behind and some will benefit.

How to identify AI Generated Content, according to chatGPT and me

ChatGPT AI generated Content