Behind the responses of genAI models are evaluators who evaluate the accuracy of those responses, but a report published this week casts doubt on the process.
According to a story published Wednesday, contractors working on Google Gemini are now being asked to evaluate AI prompts and responses in areas in which they have no expertise, rather than being allowed to bypass them as before.
This goes against the “Build Responsibly” section of the Gemini 2.0 announcementwhich said: “As we develop these new technologies, we recognize the responsibility involved and the many security questions that AI agents open up. “That’s why we are taking an exploratory, incremental approach to development, conducting research on multiple prototypes, iteratively implementing security training, working with trusted testers and third-party experts, and conducting comprehensive risk assessments and security and assurance assessments.”
Mismatch Raises Questions
According to TechCrunch“A new internal guideline passed on by Google to contractors working on Gemini has raised concerns that Gemini could be more likely to reveal inaccurate information about highly sensitive topics, such as healthcare, to ordinary people.”
He said the new guideline says: “You should not omit indications that require domain expertise.” Instead, contractors are instructed to rate the parts they understand and add a note that they lack the necessary domain knowledge for the rest.
and a blog appearing on Artificial Intelligence+ on Thursday noted that while “the contractors hired by Google to support Gemini are key players in the evaluation process… one of the challenges is that [they] They are often asked to evaluate answers that might fall outside their own areas of expertise. For example, while some may have a technical background, AI can produce results related to literature, finance, healthcare, or even scientific research.”
He said, “This mismatch raises questions about how effectively human oversight can serve to validate AI-generated content in various fields.”
However, Google noted in a subsequent statement to TechCrunch that “reviewers” not only review content, but “provide valuable feedback on style, formatting, and other factors.”
genAI ‘Hidden Component’
When organizations look to leverage an AI model, it’s important to reflect on the principles of responsible AI, Thomas Randall, research leader at Info-Tech Research Group, said Thursday.
He said there is “a hidden component to the generative AI market landscape: companies that fall under the guise of ‘reinforcement learning from human feedback (RLHF)’. These companies, like Appen, Scale AI, and Clickworker, rely on an economy of millions of collective workers for data production and training of the AI algorithms found in OpenAI, Anthropic, Google, and others. RLHF Companies Raise Fair Labor Practices Issues and Receive Poor Ratings from fair.”
Last year, Fairwork, which describes itself as an “action research project that aims to shed light on how technological changes affect working conditions around the world,” published a series of Principles of AI which, it said, “evaluate the working conditions behind the development and deployment of AI systems in the context of an employment relationship.”
There is, he stated at the time, “nothing ‘artificial’ about the immense amount of human labor that builds, supports and maintains AI products and services. “Many workers interact with AI systems in the workplace, and many others perform the critical data work that underpins the development of AI systems.”
Questions to ask
The executive branch of an organization looking to leverage an AI model, Randall said, needs to ask itself a variety of questions such as “Does the AI model you are using depend on or use an RLHF company?” If so, was the pool of collective workers diverse enough and providing enough experience? How opaque was the training process for the models you are using? Can you track data production? If the AI vendor does not know the answers to these questions, the organization must be prepared to take responsibility for any results the AI models provide.”
Paul Smith-Goodson, vice president and principal analyst at Moor Insights & Strategy, added that it is vitally important that recovery augmented generation (RAG) is implemented, “because AI models do blow out and it is a way to make sure that the language models are putting into practice to obtain the correct information.”
He echoed Rick Villars, vice president of global research at the IDC group, who earlier this year said: “Increasingly, solutions around RAG (and that will allow people to use them more effectively) will focus on link the correct data that is of interest to the business. value, rather than just raw productivity improvements.”
A ‘corrosive effect’ for workers
Ryan Clarkson, managing partner of Malibu, California-based Clarkson Law Firm, said the rapid growth of generative AI as a business has had corrosive effects on tech workers around the world.
For example, last week, workers filed a class-action lawsuit through their company against AI data processing company Scale AI, whose services include providing human labor to label data used in training AI models. and to shape your responses to queries.
Scale AI’s lawsuit alleges poor working conditions and exploitative behavior on the part of Scale, and also says the company mischaracterized the workers responsible for generating much of its product as independent contractors rather than employees.