Data is the information that drives business. It can be structured in rows and columns, like a customer name, address, and phone; and it can be unstructured, such as an email or a social media post. Structured data is what is populated in Relational Database Management Systems such as those created by Oracle, IBM and Microsoft, and open-source PostgreSQL and MySQL, among others. That data can be accessed using the standard Structured Query Language (SQL). Unstructured data resides in what are called NoSQL databases, such as Cassandra, Couchbase, MongoDB and many, many others. Many organizations today run both kinds of databases.
Once the data is stored, it must be easily retrievable, found amid the mountains of data organizations collect, and made available at scale. Numerous tools exist for those jobs, including Hadoop, Apache Spark and many more. It is through the collection and analysis of data that businesses can make decisions that affect their bottom line.
Google has announced that it is open sourcing a new Java-based differential privacy library called PipelineDP4J. Differential privacy, according to Google, is a privacy-enhancing technology (PET) that “allows for analysis of datasets in a privacy-preserving way to help ensure individual information is never revealed.” This enables researchers or analysts to study a dataset without accessing … continue reading
Having the correct customer information in your databases is necessary for a number of reasons, but especially when it comes to active contact information like email addresses or phone numbers. “Data errors cost users time, effort, and money to resolve, so validating phone numbers allows users to spend those valuable resources elsewhere,” explained John DeMatteo, … continue reading
Microsoft has announced and is open-sourcing a new data processing system called Drasi that can detect and react to changes in complex systems. This new project “simplifies the automation of intelligent reactions in dynamic systems, delivering real-time actionable insights without the overhead of traditional data processing methods,” Mark Russinovich, CTO, deputy chief information security officer, … continue reading
MongoDB has announced the release of the latest version of its database platform—MongoDB 8.0. According to the company, this release offers significant performance improvements compared to MongoDB 7.0, such as 36% better read throughput, 56% faster bulk writes, 20% faster concurrent writes during replication, and 200% faster handling of higher volumes of time series data, … continue reading
Google is announcing several new Chrome features aimed at better protecting users as they browse the web. Safety Check — a tool that checks for compromised passwords, Chrome updates, and other potential security issues in the browser — has been updated to run automatically in the background so that it can be more proactive in … continue reading
Organizations are getting caught up in the hype cycle of AI and generative AI, but in so many cases, they don’t have the data foundation needed to execute AI projects. A third of executives think that less than 50% of their organization’s data is consumable, emphasizing the fact that many organizations aren’t prepared for AI. … continue reading
Time series data is an important component of having IoT devices like smart cars or medical equipment that work properly because it is collecting measurements based on time values. To learn more about the crucial role time series data plays in today’s connected world, we invited Evan Kaplan, CEO of InfluxData, onto our podcast to … continue reading
Pinecone, a vector database for scaling AI, is introducing a new bulk import feature to make it easier to ingest large amounts of data into its serverless infrastructure. According to the company, this new feature, now in early access, is useful in scenarios when a team would want to import over 100 million records (though … continue reading
With businesses uncovering more and more use cases for artificial intelligence and machine learning, data scientists find themselves looking closely at their workflow. There are a myriad of moving pieces in AI and ML development, and they all must be managed with an eye on efficiency and flexible, strong functionality. The challenge now is to … continue reading
The open-source distributed PostgreSQL platform, pgEdge, has a new release with advanced logical replication features, large object support, and improved error handling. “These enhancements make pgEdge an even more powerful alternative for legacy multi-master replication technologies, offering greater throughput, flexibility, and control for users,” Phillip Merrick, co-founder and CEO of pgEdge, wrote in a blog … continue reading
Good teamwork is key to any successful AI project but combining data scientists and software engineers into an effective force is no easy task. According to Gartner, 30 percent of AI projects will be abandoned by the end of 2025 thanks to factors such as poor data quality, escalating costs and a lack of business … continue reading
MongoDB is launching a new technology stack to enable customers to build AI applications. The MongoDB AI Applications Program (MAAP) will feature reference architectures, integrations with leading AI technology providers, and a support system for customers featuring access to experts and education. According to MongoDB, many customers have reported that they lack the multi-modal data … continue reading