Alex Preston / blog

Recap of HackPrinceton

Wed Apr 20 2022

I wanted to compete in one last hackathon before I graduated in May, so the other weekend I competed with Julian LaNeve, Josh Hascall, and Isaac Rose in HackPrinceton. We had brainstormed a few ideas prior to the competiton but ultimarly settled on building a tool around detecting bias in the media and in one's own writing.

36 hours of coding and a few coffees later, we created Sway: an editor, chrome extension, and search engine.

The project at it's core uses GPT-3 with a customized dataset we created. OpenAI makes it fairly easy to customize GPT-3 and they provide tons of documentation.

Our customized dataset we provided GPT-3 was essentially sentences and labels that categorized text based on varying degrees of bias.

{"text":"Apple computers are the best computers on the market. They are sleek, powerful, and easy to use. There is no other computer that can compare to an Apple.","label":"Extremely Biased","metadata":null}  
{"text":"Apple computers are good computers. They are not as powerful as some of the other computers on the market, but they are still a good choice for many people.","label":"Somewhat Biased","metadata":null}  
{"text":"\"I wasn't sure what to make of it\" Jonathan said","label":"Not biased","metadata":null}  
{"text":"I hate this food.","label":"Extremely Biased","metadata":null}

This model was able to answer our two questions 1. To what degree is a text biased? 2. If a text is biased, why is it biased?

Then, we were able to use this trained model to build an API in FastAPI for our frontend applications.

def summarize_bias(text):
    # generate the prompt to run through the model
    prompt = "\n".join([line.strip() for line in f"""
    Is this article biased? If so, please provide a summary of the bias in one paragraph.

    {text}
    """.split('\n') if line.strip()])

    response = openai.Completion.create(
        engine = 'text-davinci-002',
        prompt = f"{prompt}\n\n##\n\n",
        max_tokens = 1500,
        temperature = 0.1,
        top_p = 0.9,
        n = 1,
    )

    if response:
        return response.choices[0].text.split("\n\n")[-1]
    else:
        raise Exception("No response from model")


def classify_bias_level(text):
    model_name = os.environ.get('OPENAI_CLASSIFICATION_MODEL_NAME')

    if not model_name:
        raise Exception("No model name provided")

    prompt = f'{text}\n\n###\n\n'

    response = openai.Completion.create(
        model = model_name,
        prompt = prompt,
        max_tokens = 5,
        logprobs = 3,
    )

    response_text = response.choices[0].text.lower()

    for bias_level in classsification_options:
        if bias_level in response_text:
            if bias_level == 'not_biased':
                return 'Not biased'
            elif bias_level == 'somewhat_biased':
                return 'Somewhat biased'
            elif bias_level == 'extremely_biased':
                return 'Extremely biased'
            else:
                raise Exception("Unknown bias level")

    return "Couldn't classify bias level"

These two functions were the core of the backend and served all our front end applications: the Chrome extension, text editor, and search engine.

The Chrome extension works by injecting a script into a webpage to grab all the HTML and then sends it to our backend for processing. The injection process all happens in JavaScript and is based off this Stack Overflow post.

The HTML is then sent through readability to get the main body of the document, and BeautifulSoup to get the text out of the HTML tags.

from readability import Document
from bs4 import BeautifulSoup
from html import unescape

def extract_content(html):
    doc = Document(unescape(html))
    summary = doc.summary()

    bs_node = BeautifulSoup(summary, 'html.parser')

    for tag in bs_node(['style', 'script', '[document]', 'head', 'title']):
        tag.extract()

    return bs_node.getText()

This final result can then be sent through our backend endpoint to find the bias level/reason.

The text editor is probably the simplest frontend application. It is simply a React component that sends the entered text to our backend to find if and why the entered text is biased.

Lastly, the search engine works by first taking a user's search term and then getting Google's search results from the Python package googleapiclient. This package provides us with URLs for each search result. The HTML is then requested using the requests module for each URL. We were able to use the same code from the Chrome extension to extract the text from HTML and get a bias level. Finally, the results are sent back to the frontend and sorted from least biased to most biased.

If you're interested in digging through the code, check out our Github repo.