Multimodal AI: the basics多模態生æˆå¼AI
Let’s start with modes. Think of a mode like a human sense. You might see and taste a carrot, for instance. You would be able to identify that you were eating a carrot faster than if you had to eat the carrot blindfolded. You could also identify the carrot if you could see but not taste it. If it was not carrot shaped (eg puree) you might still guess it was carrot from the colour. But if you could eat that puree as well, you could get confirmation from the flavour. That’s multimodal AI in a nutshell. It’s a combination of different inputs, allowing the learning intelligence to infer a more accurate result from multiple inputs.
我們先從模å¼é–‹å§‹èªªèµ·ã€‚ä½ å¯ä»¥æŠŠæ¨¡å¼æƒ³åƒæˆäººé¡žçš„æ„Ÿå®˜ã€‚ä¾‹å¦‚ï¼Œä½ å¯èƒ½æœƒçœ‹åˆ°å’Œåšåˆ°ä¸€æ ¹èƒ¡è˜¿è””ã€‚å¦‚æžœä½ ä¸æˆ´çœ¼ç½©åƒèƒ¡è˜¿è””ï¼Œä½ æœƒæ¯”è¼ƒå¿«åœ°è¾¨èªå‡ºä½ 在åƒèƒ¡è˜¿è””。å³ä½¿ä½ ä¸èƒ½åšåˆ°ï¼Œåªèƒ½çœ‹åˆ°èƒ¡è˜¿è””ï¼Œä½ ä¹Ÿå¯ä»¥è¾¨èªå‡ºå®ƒã€‚如果胡蘿蔔ä¸æ˜¯åŽŸä¾†çš„å½¢ç‹€ï¼ˆä¾‹å¦‚æ³¥ç‹€ï¼‰ï¼Œä½ å¯èƒ½é‚„是å¯ä»¥å¾žé¡è‰²çŒœå‡ºå®ƒæ˜¯èƒ¡è˜¿è””ã€‚ä½†å¦‚æžœä½ èƒ½åŒæ™‚åƒåˆ°é‚£å€‹æ³¥ç‹€çš„æ±è¥¿ï¼Œä½ å°±å¯ä»¥å¾žå‘³é“得到確èªã€‚這就是多模態人工智慧的概念。它是ä¸åŒè¼¸å…¥çš„組åˆï¼Œè®“å¸ç¿’çš„æ™ºæ…§èƒ½å¤ å¾žå¤šå€‹è¼¸å…¥æŽ¨æ–·å‡ºæ›´æº–ç¢ºçš„çµæžœã€‚
微軟新版Bingæœå°‹å·²æ³¨å…¥GPT-4技術 è˜åˆ¥åœ–åƒã€è¤‡é›œæœå°‹äº’動行為沒å•é¡Œ