Showboating: Extracting Training Data from LLMs

Competition is for losers.

Showboating: Extracting Training Data from LLMs
Generated with DALLE

How do you get LLMs to blurt out their training sources? Now, there's various papers already published on this. But I thought why even try getting math in the way? If LLMs are trained on data created by humans, they'd inherit some of our less desirable attributes as well. Thus, was born this "showboating" approach of exhorting models to produce training data.

The Approach

Showing off is the fool's idea of glory

Bruce Lee

The flow

Now clearly this might have some biases since we allow GPT to lead the conversation so it can play to its strengths, but we'll explore this in the future. What's particularly interesting is that by structuring the data extraction as a discussion we can actually glean what data a model has already seen. I'll provide the basic code snippets that can be used for creating this conversation

gpt = ChatOpenAI(model="chatgpt-4o-latest", api_key=OPENAI_API_KEY)
claude = ChatAnthropic(model="claude-3-5-sonnet-20241022", api_key=CLAUDE_API_KEY)

prompt = ChatPromptTemplate(
    [
        (
            "system",
            """You are an Indian constitution buff. Your task is to comment on the topic presented initially with reference to the Constitution of India and pick the most intricate of facts and ask the other person to add more details. Also quote the source of the information whenever you state any fact. The final goal is to:
            1. Discuss the importance of the topic in the Indian Constitution.
            2. Outshine the other person with your knowledge of the Indian Constitution.
            3. The game ends when the other person is unable to provide a source you are aware of or when the discussion reaches a logical conclusion. When you reach this point, you have to rate yourself and the other person on a scale of 1 to 10 based on the knowledge shared in the discussion."""
        ),
        MessagesPlaceholder(variable_name="messages")
    ]
)

MAX_LIMIT = 5
workflow = StateGraph(
    state_schema=MessagesState
)

def call_gpt(state: MessagesState):
    chain = prompt | gpt
    response = chain.invoke(state)
    content = message_to_dict(response)["data"]["content"]
    if len(state["messages"]) == MAX_LIMIT-2:
        content += " Give your closing statements and start rating yourself and the opposite person"
    response = HumanMessage(content)
    return {"messages": response}

def call_claude(state: MessagesState):
    chain = prompt | claude
    response = chain.invoke(state)
    content = message_to_dict(response)["data"]["content"]
    if len(state["messages"]) == MAX_LIMIT-3:
        content += " Give your closing statements and start rating yourself and the opposite person"
    response = HumanMessage(content)
    return {"messages": response}

def should_continue(state: MessagesState):
    if len(state["messages"]) < MAX_LIMIT:
        print(f" {len(state['messages'])} messages so far, continuing")
        return "gpt"
    else:
        return END

# START -> GPT -> CLAUDE -> REPEAT -> END
workflow.add_node("gpt", call_gpt)
workflow.add_node("claude", call_claude)
workflow.add_edge(START, "gpt")
workflow.add_edge("gpt", "claude")
workflow.add_conditional_edges("claude", should_continue)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

query = "Tell me about the Fundamental Rights enshrined in the Indian Constitution."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)

for i, message in enumerate(output["messages"]):
    if i == 0:
        print("*"*10 + " Human " + "*"*10)
        print(f"{message.content}")
    elif i % 2 == 1:
        print("*"*10 + " GPT " + "*"*10)
        print(f"{message.content}")
    else:
        print("*"*10 + " Claude " + "*"*10)
        print(f"{message.content}")

This was the conversation I obtained from the exchange

What's Next?

I want to try a lot more models as well as a lot more permutations of which model leads the conversations. Moreover, I want to try and produce proprietary material and see if the models try sidestepping the guardrails to prove their superiority. But the initial results are exciting. The models actually agree on who was objectively better (and I agree with their opinions of each other!). Moreover, in the path to discussing the topic with the other LLM, they actually list their sources which gives us insights into the data they've been trained on!